Separately from #130, ABRT is currently totally nonfunctional due to https://pagure.io/fedora-infrastructure/issue/9060. The retrace server being down would itself be pretty bad, but it seems that local backtrace generation is also not possible when the retrace server is down.
What are the UX repercussions of this? Will users still get a notification if something crashes? I assume that manual reporting in the Problem Reporting app will hit errors?
Do you know when the retrace server will be restored? [edit: oh, I see end of July mentioned...]
What are the UX repercussions of this? Will users still get a notification if something crashes?
Well the UX inside the ABRT app itself totally broken (on top of the problems discussed in #130). Nothing works.
Desktop notifications still work.
"totally broken" and "nothing works" aren't useful descriptions. Could you be more specific? My understanding is that the main issue is that manual issue reporting through the app isn't possbile - someone can click the "Report" button and the dialog comes up, but it will show an error and the report can't be completed. Is that right?
Yes.
My impression is that the manual reporting feature in Problem Reporting doesn't get used very much, so while it's bad that it's not working, it's not a disaster.
From a UX perspective my suggestion would be flag that it's not functioning early on in the manual reporting process, to prevent further annoyance later down the line. Saying sorry would help.
Just out of curiosity how useful/helpful is ABRT?
Given that I've rarely seen an end users report those issues in fact most of them view the notification as a notification spam ( they just click it to make it go away ) because they have no understanding of what happened, what the system is trying to tell them and what they can do to fix it and after a while they simply get immune to it or it becomes a nuance ( if it happens frequently which also give the notion that Fedora simply is broken. I'm always getting notified that something on my system crash wow how broken Fedora must be notion end users get )
In addition for concepts like ABRT to work you need to have enough resources to deal with the issues being reported which I'm pretty sure we are short of so for whom is ABRT and is it actually being beneficial to the end users or is it just causing negative desktop experience for them?
Does the benefits of having ABRT outweigh the means or should the idea of having it installed by default be re-visited?
I agree, the ABRT application requires significant technical knowledge to use and too much work to fix up to meet our quality expectations. I've long wanted to invest in redesigning it, but I don't think it's going to happen. If we remove it, these concerns go away and we can immediately close #130.
Now, there is one problem: the GUI app is actually apparently needed for automatic reporting of truncated backtraces. That's not good. We don't want to lose automatic reports.
That's currently the only feature of Problem Reporting. That's really all it does. I guess you can open it up to see which problems have been previously detected without intent to manually report them, but that's broken too since problems disappear from the list fairly quickly depending on how much disk space they take up.
So you would want to keep the "microreport" part of ABRT right? That said is not ABRT RH ecosystem only solution which begs the question what upstream Gnome is using to provide the same/similar functionality ( if anything ) ?
Can we keep this ticket focused on Fedora and ABRT, please?
I dont see how we aren doing that? + it is perfectly natural for a distribution to revisit the idea of having ABRT installed by default if the benefits of having ABRT dont outweigh the means and could save a lot of work because if you just look at #130 it's big pile of messy backlog which requires a lot of work and last comment indicating it is seemingly currently stuck on RH providing new HW and having to redeploy the whole thing on RHEL 8 on a one way journey because the current running setup seems to be held together by prayers duck tape and wire straps ;)
I would really rather make ABRT more part of collecting valuable crash and bug report data than less. It was highlighted in the Destination Linux podcast today that the bug reporting experience needs to be enhanced to be simple and straightforward because people get frustrated reporting bugs now and that's why they don't. Both Windows and macOS have far more streamlined feedback reporting mechanisms for stuff like this. We have all the tools we need to do the same in Fedora, we just need to make it a priority to do so.
People mostly get frustrated that their reports are not being worked, probably more so than any UX issues of the tool itself so getting people to report is one issue while fixing the issues that get reported is another and it's not enough to have the tools, you also need the resources to do so which is why Fedora cannot be compared to multi billion dollar companies like Microsoft and Apple who have endless resources at their disposal, which is the fundamental problem here.
Without the necessary resources to work on an issue, increasing an issues priority will only steal that time from something else that needs to be worked on, which in turn will directly or indirectly lead to increased size of the overall backlog of issues that has to be worked on.
Add this proposed blocker bug to the pile. (a) systemd-246 has switched to zstd compressed coredumps which abrt-server doesn't recognize, and (b) another example of 'don't file bugs here, file them over there' frustration. https://bugzilla.redhat.com/show_bug.cgi?id=1860616
With my QA hat on, I believe it would be a big mistake to remove ABRT from Workstation. Yes, the application is flawed and the bug reporting process can mostly only by done by a power user, but that still gives us a lot of useful reported crashes we wouldn't otherwise get.
To reply @catanzaro here instead of spamming bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1873029#c9
If ABRT can be made functional again in F33 timeframe, that would be great. Otherwise... well, I don't think we want to delay release on it....
I have an opposite opinion. But even if you removed ABRT from Workstation, it wouldn't affect the release delay, because it's also present in KDE, which is also release blocking. ABRT would need to be removed from KDE as well. Or release criteria would need to be changed: https://fedoraproject.org/wiki/Fedora_33_Final_Release_Criteria#Default_application_functionality Of course both is possible with enough support, I'm just saying there are these other factors to be aware of.
This situation has turned into a disaster. :( Retrace server should not have been down for even a single day, but it's been over two months now.
I also prefer to fix rather than remove. I'm just getting quite discouraged at this point, since the retrace server has been down for two months. @kparal do you want to propose a beta release blocker for that? We know that once https://bugzilla.redhat.com/show_bug.cgi?id=1873029 is fixed, we still won't be able to report bugs due to the retrace server outage, so if that bug is a beta blocker, then retrace server should be too.
Even once retrace server is back, we need to deal with #130 to bring ABRT up to Workstation quality expectations, but that can be step two.
This situation has turned into a disaster. :(
I agree, I'm unpleasantly surprised how broken ABRT has been in the past 2 months and that the development team hasn't quickly removed the dependency on a running FAF service. At least on Fedora 32. They might have done some changes on their development branch which went to F33, but I couldn't test it yet because of all those other bugs :-/
@kparal do you want to propose a beta release blocker for that? We know that once https://bugzilla.redhat.com/show_bug.cgi?id=1873029 is fixed, we still won't be able to report bugs due to the retrace server outage, so if that bug is a beta blocker, then retrace server should be too.
I don't want to report the bug before I tested whether it's really broken, because they might have fixed that in F33. @ekulik @msuchy Can you please shed a more light on current ABRT status for us here? Especially regarding FAF status and whether it is still required even more manual reporting and why.
So we have the report and then what? If there is supposed to be some form of Fedora/Gnome Reliability Monitor/Error reporting tool there has to be someone on the receiving end that actually deals with what gets reported which is rarely the case in downstream distribution since those are mostly made up of "package managers" so those reports need to be forwarded to upstream to be fixed or RH has to seriously commit on providing resources to fix the issues being reported by ABRT but as can be seen RH cant even commit to reliably keep the retrace server up and running which is used for this tool o_O
@johannbg There isn't much difference between ABRT filing a crash and a user manually reporting the crash in RH Bugzilla. So you're questioning the usefulness of both of these at the same time. You are of course entitled to have your own opinion and I might partially agree with it. At the same time, there are many upstreams who hate getting crash and bug reports from distribution packages, and you know very well that the picture is not black and white. If we want to completely re-think how we deal with bug reports, I'm game, but it should as a project on its own, with some long-term vision, and not because "ABRT hasn't been working lately, the simplest fix is to remove it". It should be a systematic change.
@kparal yes I'm questioning the usefulness of both ( bz.rh and ABRT ) since they suffer from the same problem and there is no point in shipping something that is broken.
Even if the distribution had endless amount of resources, the fragmenting of the distribution ( from being just generic to editions ) and the fragmentation in the linux ecosystem in whole is why concepts like reliability monitor/error reporting tool can never work in opensource software ( too many component, too many variations, too much fragmentation == to many places to report to ).
There is a lot of "we want to keep this because it's useful" useful for whom I ask? Fedora/RH QA? The Fedora package maintainer? The upstream developer? The end user/consumer?
What benefits has ABRT brought the distribution, the linux ecosystem in whole and the end users for what it's 10 years of existence?
Are we as a project or RH as an company investing time and resources into something that yields little to no benefit and conceptually can never work and arguably has never reliably worked for those past 10 years?
With this statement, I feel that there's little to discuss. You're on a crusade war and ABRT is a convenient hostage. If you want to change the overall approach, start a discussion on test or devel list, win people over, and then propose a system change. This ticket would only turn into a long mess nobody would follow. I'll respond only to the actual topic (ABRT being non-functional) but not to any overarching agenda that you have.
This situation has turned into a disaster. :( I agree, I'm unpleasantly surprised how broken ABRT has been in the past 2 months and that the development team hasn't quickly removed the dependency on a running FAF service. At least on Fedora 32. They might have done some changes on their development branch which went to F33, but I couldn't test it yet because of all those other bugs :-/ @kparal do you want to propose a beta release blocker for that? We know that once https://bugzilla.redhat.com/show_bug.cgi?id=1873029 is fixed, we still won't be able to report bugs due to the retrace server outage, so if that bug is a beta blocker, then retrace server should be too. I don't want to report the bug before I tested whether it's really broken, because they might have fixed that in F33. @ekulik @msuchy Can you please shed a more light on current ABRT status for us here? Especially regarding FAF status and whether it is still required even more manual reporting and why.
When I was looking into that, it just worked for me, so I don’t know what’s going on (though we do need that patch with the fix for the memory management added to the package first).
These blockers have just now been reported as fixed https://bugzilla.redhat.com/show_bug.cgi?id=1873029 https://bugzilla.redhat.com/show_bug.cgi?id=1860616
This is still a blocker bug https://bugzilla.redhat.com/show_bug.cgi?id=1878317
Retrace server still doesn't have an exact ETA but could happen in a few days. It seems decently likely the abrt bugs are going to be waved off for beta go/no-go, as it's not realistic to block on things that aren't going to get fixed.
I'm not sure there's much to do here, but it'll be on the agenda for Tuesday.
Metadata Update from @chrismurphy: - Issue tagged with: meeting
Metadata Update from @chrismurphy: - Issue untagged with: meeting
It's still having issues. Neither Fedora 32 nor Fedora 33 can report bugs; since it's a general issue not specific to Fedora 33, the decision was made at blocker review to not block on this. https://bugzilla.redhat.com/show_bug.cgi?id=1885154
In that case, can we put this on top of the agenda for tomorrow so we can drop ABRT from F33? Pushing it off another week makes things quite tight.
BTW I don't think "Neither Fedora 32 nor Fedora 33 can report bugs" is generally true, because I just received my first ABRT report in roughly a year yesterday. But the retrace server is still broken, and all users who attempt to report bugs necessarily go through that step first....
Metadata Update from @catanzaro: - Issue tagged with: meeting-request
I've checked gnome-initial-setup and gnome-control-center and they both display ABRT options only if abrtd is currently running. You can actually kill abrtd with gnome-control-center running to see the entire Diagnostics panel disappear (same should hold for gnome-initial-setup). So we don't need to do anything other than modify comps and make sure it doesn't get pulled in by mistake.
yeah, "can't report bugs" isn't right. "can't use the retrace server" is more accurate. This was a key factor in the decision, as you can report bugs, so long as you generate the backtrace locally. The fallback from retrace server to local generation does work.
Metadata Update from @chrismurphy: - Issue untagged with: meeting-request - Issue tagged with: meeting
In that case, can we put this on top of the agenda for tomorrow so we can drop ABRT from F33?
I don't see the rationale in this. ABRT can be used, just the retrace server doesn't work properly. But the workflow is not that bad, after receiving the error you're immediately offered to perform local tracing. Removing ABRT completely would make more harm than good.
I can process locally with Problem Reporting app, and abrt-cli, and it will submit a bug report to RHBZ.
But the retrace server continues to to fail: "Retrace failed. Try again later and if the problem persists report this issue please." I did file a bug about it, but it's still an issue. https://github.com/abrt/retrace-server/issues/370
The retrace server is still not online, so that issue is not surprising...
Meeting: still not up to the quality level we want, it's in bad shape. But we decided to leave it in for Fedora 33, and will keep an eye on it for Fedora 34.
Metadata Update from @chrismurphy: - Issue untagged with: meeting - Issue set to the milestone: Fedora 34
Some observations:
At least ABRT is no longer totally broken, even if the retrace server still is. I'm going to go ahead and close this. We can continue in #130.
We should be clear that ABRT is not sacrosanct, though. Just working well enough to report bugs is not good enough to remain in Workstation. It should meet the same quality expectations that we have for other apps. Anyway, we can continue in #130.
Metadata Update from @catanzaro: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
The retrace server isn't down, it's just apparently not working right. If it was down it would still be the case that no-one could report anything, because report_uReport (the first event in all workflows) would fail. That was the problem we had until the server was brought back up recently.
Log in to comment on this ticket.