Learn more about these different git repos.
Other Git URLs
Seeing https://pagure.io/copr/copr/c/88806245814ef3dc59a7510faaa6c078ea52a792?branch=master, I must give +1 for this idea. Builder really is natural place to build srpms.
Though note that "pushing" the changes back to dist-git is totally insecure. It is pretty trivial to escape from chroot (by installing hacked RPM into minimal buildroot) and gain root access on builder -- so granting "push access" to dist-git (be that via client cert or whatever) would mean that any hacked builder might completely destroy dist-git machine.
The correct way would be to (1) build SRPM on builder, and (2) import that srpm into dist-git on dist-git.
Note that this becomes pretty trivial anyways with copr-builder package (we can hack on copr-builder or add additional script for Tito, mock-scm, ... wokflows). Though if we want to reuse the actual builders and not to have separate set of VMs -- we need to implement the "build srpm" requests on backend probably ....
copr-builder
I don't know what you mean by "completely destroy dist-git machine". You could start uploading like crazy, right. That would bring dist-git down. There is an alternative to push the built srpm from backend to dist-git after it has been downloaded from a builder but we certainly don't want to do the same thing (building srpm) at two places. The copr-dist-git should serve as "build history" so that we are able to reproduce a build with sources as they were when the build was submitted. It might not be an infinite history but still at least some history.
Note that I updated the change proposal (https://pagure.io/copr/copr/c/f5ab70925b12391cadd83ca8d0c36d4e5e6c3ae7) also to address the mentioned problem.
LGTM, thanks!
Reading the brainstorming again:
* open copr-dist-git for user-interaction * do not store built srpm there anymore, instead build them on builders based on db data (git/svn hashes, gem/pip package versions) * potentially use copr-cli for building on builders by employing (--local --detached) switches **
Can we reconsider the second point? It is actually super cool to have the "state" of actually built RPMS baked into dist-git. It will look very weird if some RPM is in repository, and nothing is in dist-git.
The dist-git machine as is become one of the best things on copr -- we simply upload SRPM and do the build; and because the srpm is completely committed to dist-git (and lookaside) everyone can fully review what changed in RPMS (cgit allows showing git diff directly) and most importantly, the we can always rebuild from dist-git only --- so as long as the SRPM is committed, we are completely independent from the outside world.
From brainstorming.rst:
** actually scratch the last point. What we need is a builder script that gets just task_id as input and downloads the build definition from frontend and based on that executes the whole build. The current copr-builder fixes some parts of build definition (most importantly the dist-git source args) on its command line which makes future changes like in point 2 basically a no-go. We need to keep our options open in this regard.
For this reason, not-storing the SRPM into dist-git is actually no-go from my POV, cleardownvote. Building against build-id is not a bad idea, but ensure the downloaded contents of github or tito, or mock-scm, pypi .... is always fully committed to dist-git ....
build-id
Note that I told you that moving srpm build to builder should be done (like ~3 years ago) so I agree form long term POV.... But with trivial patch as in https://pagure.io/copr/copr/pull-request/70 it is mostly micro-optimization (PR 70 makes the huge performance gain with this regard). There's no point to do such quick architecture changes headlong overnight ... (edit: I'd be glad to discuss such changes)
https://pagure.io/copr/copr/pull-request/70
And if moving SRPM build to copr-builder side means "no commits in dist-git", I totally against.
Reading the brainstorming again: open copr-dist-git for user-interaction do not store built srpm there anymore, instead build them on builders based on db data (git/svn hashes, gem/pip package versions) potentially use copr-cli for building on builders by employing (--local --detached) switches * Can we reconsider the second point? It is actually super cool to have the "state" of actually built RPMS baked into dist-git. It will look very weird if some RPM is in repository, and nothing is in dist-git. The dist-git machine as is become one of the best things on copr -- we simply upload SRPM and do the build; and because the srpm is completely committed to dist-git (and lookaside) everyone can fully review what changed in RPMS (cgit allows showing git diff directly) and most importantly, the we can always rebuild from dist-git only --- so as long as the SRPM is committed, we are completely independent from the outside world. From brainstorming.rst: ** actually scratch the last point. What we need is a builder script that gets just task_id as input and downloads the build definition from frontend and based on that executes the whole build. The current copr-builder fixes some parts of build definition (most importantly the dist-git source args) on its command line which makes future changes like in point 2 basically a no-go. We need to keep our options open in this regard. For this reason, not-storing the SRPM into dist-git is actually no-go from my POV, cleardownvote. Building against build-id is not a bad idea, but ensure the downloaded contents of github or tito, or mock-scm, pypi .... is always fully committed to dist-git ....
Reading the brainstorming again: open copr-dist-git for user-interaction do not store built srpm there anymore, instead build them on builders based on db data (git/svn hashes, gem/pip package versions) potentially use copr-cli for building on builders by employing (--local --detached) switches *
Can we reconsider the second point? It is actually super cool to have the "state" of actually built RPMS baked into dist-git. It will look very weird if some RPM is in repository, and nothing is in dist-git. The dist-git machine as is become one of the best things on copr -- we simply upload SRPM and do the build; and because the srpm is completely committed to dist-git (and lookaside) everyone can fully review what changed in RPMS (cgit allows showing git diff directly) and most importantly, the we can always rebuild from dist-git only --- so as long as the SRPM is committed, we are completely independent from the outside world. From brainstorming.rst: ** actually scratch the last point. What we need is a builder script that gets just task_id as input and downloads the build definition from frontend and based on that executes the whole build. The current copr-builder fixes some parts of build definition (most importantly the dist-git source args) on its command line which makes future changes like in point 2 basically a no-go. We need to keep our options open in this regard.
Note that if the repository is maintained elsewhere (Github, Gitlab, Pagure) and it has a reliable history, we don't need to duplicate data from there from whatever fabricated purpose.
The only reason to duplicate the data (i.e. transform them into typical tarball+spec pair) would be if someone wants to add patches upon the upstream repo. That use-case will be supported.
We are dependent on these services by definition (for the GIT source) and to maintain our own history that is a copy of the original history is just a huge waste of resources. For each build, we make a new tarball and store it in distgit and this will just continue to eat more and more space and becomes even a bigger maintenance nightmare. To use the incremental history available in the original repository by default simply makes sense.
Note that I told you that moving srpm build to builder should be done (like ~3 years ago) so I agree form long term POV.... But with trivial patch as in https://pagure.io/copr/copr/pull-request/70 it is mostly micro-optimization (PR 70 makes the huge performance gain with this regard). There's no point to do such quick architecture changes headlong overnight ... (edit: I'd be glad to discuss such changes) And if moving SRPM build to copr-builder side means "no commits in dist-git", I totally against.
We know each other 1,5 years so that's about the time I know about that idea. The PR#70 nicely targets one problem we have (had) and that is importing sources many times when 1 times was enough. The point 2 in the proposal says that we don't need to import certain kind of sources at all unless user explicitly asks for it (and that's because he probably wants to make downstream changes upon them and build test packages). The benefit is that we don't need to maintain huge amount of data that nobody really cares about.
You requested Bug 1427431 - [RFE] dist-git: policy for garbage collecting of lookaside cache tarballs at https://bugzilla.redhat.com/show_bug.cgi?id=1427431. If the dist-git history should be reliable and serve as a source for rebuilding, we cannot very well do this. If we did, the rebuild source would be lost in case the garbage-collection took them away and they would need to be re-imported into dist-git manually and probably into a new temporary repo before user could do his or her rebuild.
Also note that we even can't garbage-collect the data when dist-git becomes open for user-interaction.
Sorry for the delay
Not at all, unless you plan to duplicate the original git repository in copr's dist-git too. Depending on remote git storage would make copr much weaker architecture.
The other reason is the DoS/DDoS aspect I noted in PR 70. If we rely on remote git repo or anything else, we'll forever DoS upstreams (for each build and chroot we'll clone the remote reposiotry, which is terribly unfriendly). It is not a problem now, but once some upstream decides to blacklist copr, it will be too late.
IMHO, this makes the movement no-go. Just please keep the actual architecture, my honest opinion.
Also note that the proposed architecture keeps the "race condition" discussed in PR 70 (each chroot clones the remote repo in different time, which means that completely different code might be built).
Sorry for the delay Note that if the repository is maintained elsewhere (Github, Gitlab, Pagure) and it has a reliable history, we don't need to duplicate data from there from whatever fabricated purpose. Not at all, unless you plan to duplicate the original git repository in copr's dist-git too. Depending on remote git storage would make copr much weaker architecture.
I don't understand a thing you say here. We depend on remote git storage by definition because we retrieve new sources from there and COPR is mainly system for CI - that is for building new sources.
I mean, the remote side should be able handle to 10 consequent downloads. These systems are used by many users and they need to be able to handle this. Anyway this can be easily optimized by proxying the download.
That's nice. But I remember you said something about opening dist-git several times. It would be nice if you were happy for a good change once in a time.
There is no race condition there. It might happen that different code is built on different builders but that is because git commit hash was not used for build specification (that is currently not possible in COPR so it's not a user mistake), instead HEAD is taken, which is a moving target. Again, this can be solved by proxying download through one node or more simply, just asking first the remote side what is HEAD before continuing.
This is difficult and time consuming discussion, let's have a in person chat in the office tomorrow? Maybe we can have a chat with mirek too, so more heads are in room..
For the initial build. But upstream repos are born and die all the time ... once the package has been built, we need to have full source (at least for the last built package). We can not depend on remote repos...
I think we should do it the right way -> there's one request, which should download to source once -> and also deterministically built the same output for all chroots. Doing it "wrong way" and work-around it by proxy is IMO too expensive approach.
Correct, I'm all for opening dist-git for direct writing by users. That's IMO very far target now, but still .. it is something we agree on. But what I'm saying here in #68 doesn't go against this.
There is not reason not discuss it here. You close information for other people by chatting off-side
Again we depend on them already.
If some remote repo dies, probably the user won't be interested in building the sources again? I mean that's the point. For these sources, usually the people that are building (or setting up webhooks etc.) are the same people that develop the software. You are obviously missing this point.
There is no right answer to everything.
Well, it kind of goes against it. We don't really want to auto-import something into open user's repository that he or her maintains with care.
Also this is closely related to #60. There were very good points raised by our users: Building of RPM from "upstream" repos is two phase process (a) get the source from upstream, and (b) build the package from source. It is worth having the (a) done once, deterministically and have it backed up in copr dist-git.
Again -- only initially (at the time we know that the remote source exists).
If some remote repo dies, probably the user won't be interested in building the sources again?
Why do you think so?
I mean that's the point. For these sources, usually the people that are building (or setting up webhooks etc.) are the same people that develop the software. You are obviously missing this point.
Not really, maintaining packaging "CI" doesn't mean I have to be upstream maintainer. That's the point -- fedora maintainers should be motivated to setup copr-ci-packaging-workflow even though they can't commit to upstream.
Having it in dist-git as it is not is clear answer. Let me reverse the discussion -- what's the motivation to change the architecture?
Correct, I'm all for opening dist-git for direct writing by users. That's IMO very far target now, but still .. it is something we agree on. But what I'm saying here in #68 doesn't go against this. Well, it kind of goes against it. We don't really want to auto-import something into open user's repository that he or her maintains with care.
Why? That's exactly the point ... if you allow users to commit something into dist-git direcetly, than you want to have other methods (srpm upload or tito build) also reflected ... otherwise the dist-git tracks just small part of the package story...
Pagure strikes again ... and obviously nobody cares enough ATM to discuss it here and now, there's no reason to waste our time here when it is much more convenient to have in person meeting. I'll happily dump the meeting consensus here.
I see that proposal here is clear step backwards (on several fronts..) with no additional value. So perhaps personally you'll have more chances to convince me and "defeat" this architecture change.
Also this is closely related to #60. There were very good points raised by our users: Building of RPM from "upstream" repos is two phase process (a) get the source from upstream, and (b) build the package from source. It is worth having the (a) done once, deterministically and have it backed up in copr dist-git. (a) done once, deterministically
(a) done once, deterministically
Yes, this can be done.
have it backed up in copr dist-git.
We are not a back-up service for Github. See https://en.wikipedia.org/wiki/Separation_of_concerns.
So far points to be discussed in personal meeting
perl
Please let me know if there are other topics to be discussed.
Again we depend on them already. Again -- only initially (at the time we know that the remote source exists).
Yes, the "initially" the users actually care about.
If some remote repo dies, probably the user won't be interested in building the sources again? Why do you think so?
Because the people building are devs at the same time.
I mean that's the point. For these sources, usually the people that are building (or setting up webhooks etc.) are the same people that develop the software. You are obviously missing this point. Not really, maintaining packaging "CI" doesn't mean I have to be upstream maintainer. That's the point -- fedora maintainers should be motivated to setup copr-ci-packaging-workflow even though they can't commit to upstream.
Yes, they can when copr-dis-git is open (or there is also fork repo feature in the upstream, not sure if you have heard of it).
There is no right answer to everything. Having it in dist-git as it is not is clear answer.
Having it in dist-git as it is not is clear answer.
Not getting this sentence.
Let me reverse the discussion -- what's the motivation to change the architecture?
E.g 1.2T of data on copr-dist-git machine most of which is just garbage and which is being backed-up by infrastrustructure rsync scripts, which brings the whole machine down at quite short intervals? And that continues to grow at least 1-2GB per day at current rate. Not only it brings copr-dist-git down, it also impacts other parts of infrastructure, which are being backed-up from the same source. Do you know how long it takes to restore selinux contexts on such huge amount of data?
Correct, I'm all for opening dist-git for direct writing by users. That's IMO very far target now, but still .. it is something we agree on. But what I'm saying here in #68 doesn't go against this. Well, it kind of goes against it. We don't really want to auto-import something into open user's repository that he or her maintains with care. Why? That's exactly the point ... if you allow users to commit something into dist-git direcetly, than you want to have other methods (srpm upload or tito build) also reflected ... otherwise the dist-git tracks just small part of the package story...
"other methods" means that the repository will be different and if not, you don't want to mess up with user repo unless user explicitly asks for it. You can read about this principle here: https://en.wikipedia.org/wiki/Principle_of_least_astonishment
There is not reason not discuss it here. You close information for other people by chatting off-side Pagure strikes again ... and obviously nobody cares enough ATM to discuss it here and now, there's no reason to waste our time here when it is much more convenient to have in person meeting. I'll happily dump the meeting consensus here.
That is not an 'open' approach. Potential readers will not get the required context.
The value is that COPR maintainers will not need to spend nights on fixing machine that is broken because it is being flooded with tons of data that no-one really needs for anything.
So far points to be discussed in personal meeting what's the motivation for the change the actual architecture how do we solve the race condition expected to solved once and forever by PR #70 with new architecture (this needs to be done for all build options, also the future options discussed in #60) how do we solve the DDoS issue (see #70) in new architecture manual and automatic changes in RPMs (done by direct dist-git push vs. srpm upload e.g.) needs to be tracked somewhere ... where? some users have like 1.5 Gigs SRPMs, can we ensure that uploading the SRPM and extracting is done just once in the new architecture? Please let me know if there are other topics to be discussed.
what's the motivation for the change the actual architecture how do we solve the race condition expected to solved once and forever by PR #70 with new architecture (this needs to be done for all build options, also the future options discussed in #60) how do we solve the DDoS issue (see #70) in new architecture manual and automatic changes in RPMs (done by direct dist-git push vs. srpm upload e.g.) needs to be tracked somewhere ... where? some users have like 1.5 Gigs SRPMs, can we ensure that uploading the SRPM and extracting is done just once in the new architecture?
I think we have already discussed most of these things. But if you want some recap, I am ok with it.
Please provide stats before doing such statemenst... I'm copr user too, and 'cgit' provided by copr is one of the most valuable features for me ... I always check what changed between two consequent SRPM uploads..
Again, statistics? How many fedora packages are maintained by really upstream developers?
@clime: There is no right answer to everything. @praiskup: Having it in dist-git as it is not is clear answer. @clime: Not getting this sentence.
Sorry :(, I wanted to say: Having everything as is now is the clear and ultimate answer for all the problems you mentioned so far....
E.g 1.2T of data on copr-dist-git machine most of which is just garbage ...
... OK, I got this. But that's what's the bug 1427431 about -> let users define (opt-in) what is and what is not important for them. You can also have some mandatory "cleanup" garbage collector for pathollogically demanding users ... Just please don't hurt everybody (your proposed changes will unconditionally hurt my work-flows, for example).
Potential readers will not get the required context.
I can/you can happily answer to them later, also we can have "open" bluejeans chat if there are some other concerned parties... if there is somebody, please raise your voice now!
Also, you "defeat" open discussion - but even to me it is completely unclear what motivates you to change correct architecture into something which is wrong in basics. Also, you push unrevieved architecture changes all the time..
Neither point here is answered for me to be honest, so thank you for discussing this first before complete movement.
Yes, the "initially" the users actually care about. Please provide stats before doing such statemenst... I'm copr user too, and 'cgit' provided by copr is one of the most valuable features for me ... I always check what changed between two consequent SRPM uploads..
I am talking here about people who build directly from GIT. I won't be collecting stats for you. Find me a guy who builds from GIT and who is not a dev of that project or for whom it is easier to use copr-dist-git to do changes upon sources instead of forking it in Github, Gitlab, or Pagure.
Because the people building are devs at the same time. Again, statistics? How many fedora packages are maintained by really upstream developers?
Fedora packagers use Fedora DistGit, which is being open and we will provide the same option. What we are talking about here is not a packager use-case though. It's a dev's use-case who wants to build his or her software for testing or wants to provide stable release for a user.
@clime: There is no right answer to everything. @praiskup: Having it in dist-git as it is not is clear answer. @clime: Not getting this sentence. Sorry :(, I wanted to say: Having everything as is now is the clear and ultimate answer for all the problems you mentioned so far....
No, it is not. See the problems in the post that you comment on.
E.g 1.2T of data on copr-dist-git machine most of which is just garbage ... ... OK, I got this. But that's what's the bug 1427431 about -> let users define (opt-in) what is and what is not important for them. You can also have some mandatory "cleanup" garbage collector for pathollogically demanding users ... Just please don't hurt everybody (your proposed changes will unconditionally hurt my work-flows, for example).
Please, code this so that it works reliably and I will give you my hat...maybe.
Potential readers will not get the required context. I can/you can happily answer to them later, also we can have "open" bluejeans chat if there are some other concerned parties... if there is somebody, please raise your voice now!
People can stumble upon this later in time.
Well, I pushed some changes earlier that I didn't properly discuss when it was probably needed, I admit that. But now I am discussing things and talking about them. Note that what I have done so far regarding this issue is pretty much equivalent from "arch" point of view. It just offers more options.
It's correct in basics and I can argument it out anytime with anyone.
I think we have already discussed most of these things. But if you want some recap, I am ok with it. Neither point here is answered for me to be honest, so thank you for discussing this first before complete movement.
Then, it means you ignore the points that I am giving you. There is no such thing as sudden complete movement here. It will require step-by-step approach over time.
I'm not ignoring you, at least not intentionally ... I deal with large SRPMs and repeated clones from upstream all the time in Internal Copr, and I'm really concerned in this topic; motivated enough to work on this in #70 and even more...
So yeah, the only problem we try to deal with is the storage issue on dist-git, right?
'giving my hat' == 'agree with you'? ? I can't find online definition for this term ... but I agree that I can hack on frontend <-> dist-git protocol (e.g. garbage collector, there's a lot of things we can cleanup without loosing important parts; indeed) ... But I can not hack on uncertain PoC which wont be accepted in the end; so I need discussion in advance. And also "reliably" is a bit strong requirement :), nothing is really implemented reilabily till we test it and fix the major bugs with several iterations ... :)
frontend <-> dist-git
SRPMs are now being built on builders. Simple importing logic was kept on copr-dist-git to avoid pushes that would need to be authenticated.
Metadata Update from @clime: - Issue assigned to clime - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.