Many packages (mostly Java) in Fedora dist-git fail to validate with git-fsck. This prevents importing those packages' history into forges that run this check on push for security reasons. In particular, this is preventing us from importing those packages into CentOS Stream 10, which uses Gitlab as the host for its dist-git.
git-fsck
The packages specifically needed for CentOS Stream 10 are:
antlr apache-commons-cli apache-commons-codec apache-commons-exec apache-commons-io apache-commons-parent beust-jcommander bsf jdom jsch log4j maven-antrun-plugin maven-archiver maven-assembly-plugin maven-shade-plugin mojo-parent
The rest of the list is attached below. All of the affected packages have the same root issue: a packager many years ago had an extra < character in their author/committer field, which causes a (harmless) validation error.
<
What we propose:
rawhide
This will resolve the issue for these sixteen packages, but it's also worth noting that Fedora will still need to deal with this issue on other branches at some point if we move away from Pagure. We probably also want to bring Pagure in line with Gitlab here and make it run this check on pushes to ensure we don't reintroduce similar issues.
Immediately
This should be a one-time event.
We will be unable to import the affected packages directly to CentOS Stream 10 and will need to find another workaround, probably making trivial merges/cherry-picks from Fedora unusable.
The complete list of Fedora packages that fail git-fsck as of 2023-12-04.
ant-contrib antlr apache-commons-cli apache-commons-codec apache-commons-dbutils apache-commons-digester apache-commons-discovery apache-commons-exec apache-commons-io apache-commons-parent apache-commons-pool apache-mime4j beust-jcommander bsf cal10n castor checkstyle cssparser decentxml derby eclipse eclipse-cmakeed eclipse-egit eclipse-jgit eclipse-manpage eclipse-pydev eclipse-rpm-editor eclipse-rpmstubby eclipse-shelled felix-osgi-foundation geronimo-jaxrpc geronimo-osgi-support geronimo-saaj gstreamer-java httpunit icu4j javamail jaxen jboss-parent jdom jline joda-time jsch kxml log4j maven-ant-plugin maven-antrun-plugin maven-archiver maven-assembly-plugin maven-checkstyle-plugin maven-doxia-tools maven-eclipse-plugin maven-help-plugin maven-idea-plugin maven-javadoc-plugin maven-plugin-exec maven-pmd-plugin maven-release maven-repository-plugin maven-shade-plugin maven-skins mojo-parent msv mx4j nekohtml plexus-active-collections plexus-interactivity rpmorphan sqljet xdoclet xmltool
We should fix this if for nothing else because Pagure's remote pull request feature is effectively useless on these packages if you can't push to some other git server because it fails git-fsck.
I would want this to get FESCo approval and also get a lot of noise in the community before we do it... since all those repos checked out by maintainers would be useless/broken after we re-write history. ;(
Metadata Update from @phsmoura: - Issue tagged with: high-gain, high-trouble, ops
OK, with the time it would take to get approved by FESCo and then communicate it effectively to the community as a whole, I think we (CentOS Stream 10) need to proceed with our backup plan of just breaking the inheritance and importing these with revised history. I'll leave this ticket open because we do still need to come up with a longer-term solution.
Well, we still have a month, so a FESCo ticket about this now (to be discussed/voted on by Thursday) and then an announcement shortly after can give us the time we need.
Please be diligent here. "Fedora rewrites dist-git history to make RHEL's choice of non-opensource gitlab.com work" would be a very bad headline for all of us. And it's a predictable headline on you_name_it. The issue of fsck-protected pushes is too technical to make it even in the byline on those sites ...
There are good reasons for both the non-rewrite policy as well as the fsck-protection. They are in conflict here. If we were in control of both forges the obvious short-term solution would be to check the fsck warning carefully and then override it. We cannot do that because gitlab.com does not let us do that. And it's really disappointing, given git has fsck.skipList specifically for that purpose, with fine-grained control.
fsck.skipList
I'm wondering, though - where does CentOS Stream 9 have its package sources, and why hasn't the problem surfaced there?
Does gitlab.com offer shallow clones? This would allow us to get the history from 2012 onwards (I checked bsf only), which should suffice for now - unwritten and fsck'able!
.mailmap does not help fsck, btw (only log). Neither does git replace.
.mailmap
fsck
log
git replace
CentOS Stream 9 also has it on gitlab.com, but it doesn't surface there because they didn't import Git history when they forked from Fedora Linux 34. They dropped all the history of the package and created new repos. However, this messes with attribution, especially for packages that use RPMAutoSpec. Back then, RPMAutoSpec was new and not broadly adopted, but now it is used in many core packages. Thus, the community pushed for preserving the Git history for packages when forking from Fedora the next time. That time is now.
For the record, I've imported the sixteen CentOS Stream 10 branches by running the following magic incantation over them:
git filter-repo --force --email-callback 'return email.replace(b" <akurtako@redhat.com", b"akurtako@redhat.com")'
All of the issues were the result of an extra pair of characters in the email field for a short period around 2008 (likely in CVS and pulled over when we switched to git): < <akurtako@redhat.com> instead of <akurtako@redhat.com>
< <akurtako@redhat.com>
<akurtako@redhat.com>
If FESCo approves the fixup, I can pull these 16 branches (and their updated history) back over to Fedora easily. I went ahead with the CentOS Stream 10 import right away because they are blocking our switch over to using CS10 as its own buildroot (rather than relying on ELN for the buildroot).
For reference, the corresponding FESCo ticket is https://pagure.io/fesco/issue/3119
Log in to comment on this ticket.