ABRT is very important for improving the quality of Fedora Workstation. So let's give it some ❤.
ABRT has two completely separate crash reporting paths currently, so let's talk about them separately.
This is already working more or less well.
The main problem here is that crash reports never seem to make their way from the retrace server to Bugzilla unless a user manually reports the crash to Bugzilla. Ideally, once a crash is reported by, say, 100 users, a Bugzilla report would be automatically created. (Mega bonus points if it could learn to create bug reports directly in GNOME GitLab.) I think we used to have this functionality, but if so, it's broken.
But on the whole, automatic crash reporting is in good shape.
Automatic crash reports, while important and helpful, don't contain full backtraces (to protect the user's privacy, since full backtraces sometimes contain sensitive data in stack frames). Without a full backtrace, resolving a crash using just the automatic crash report can be quite difficult. Often, the only practical solution is to wait for a manual ABRT bug report to arrive, since ABRT's manual bug reports to Red Hat Bugzilla include detailed backtraces.
Sadly, ABRT's manual bug report process suffers from a very large number of quality issues.
Highest-priority issues:
Other major issues:
Each of these issues is, on its own, a serious quality problem for Fedora Workstation. Combined, it's really quite a lot.
Pet peeve quality issues:
The WG should work with the ABRT developers to understand their priorities, figure out how we can reduce the number of quality issues, and determine whether we have a plausible path forward for ABRT ❤.
Report crashes directly to GitLab. This is required for Flatpak crash reporting, but also it's a pragmatic reaction to the fact that for GNOME packages, Red Hat Bugzilla is basically an unmonitored dumpster fire where bug reports go to be ignored. We're far better at dealing with crash reports upstream than we are on Red Hat Bugzilla. I'd like to start autoclosing downstream bug reports against GNOME components (with exceptions for blocker, freeze exception, or downstream packaging bugs) so to make that successful, ABRT should learn to report directly upstream.
I do not think this will end well at all. I think we're going to see people just ignoring them upstream like they do downstream.
CC @ekulik
In practice, this does happen in some GNOME projects. But on the whole, we are far better at responding to bug reports upstream than we are downstream. Let's continue this thread of discussion in #131.
I believe most issues are pure engineering, but we could probably do something about the overall quality of reports if we were able to define what makes a good report.
FAF could also be involved in reporting to bug trackers for some finer-grained controls of whatever (I don’t know, attachments to include, something), but then there’s a level of indirection and some challenges to overcome to prevent spam (my thinking here is if we don’t require reporters to have a Bugzilla account, but now I see how that could be annoying, not being able to query for information).
I think the reports that ABRT uploads are generally already of good quality. The problems I identified above are mainly user and developer experience issues.
Probably auto-filing bugs upstream and linking them to downstream bugs in RHBZ would work better than straight-up avoiding filing bugs downstream.
Abrt bug reports depend on the retrace server properly processing a stack trace, which means the retrace server needs to have a consistent environment with all the relevant debug/debuginfo packages installed. This sometimes isn't the case, e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1806231 where it had no idea what the g-i-s package even was and questioned whether I really had Fedora installed, haha. So I had to install 6G of debug packages to process the stack trace locally. In a given cycle, this happens to me maybe 1/2 dozen times. The vast majority of users bail and don't file a bug at all.
The branching of Rawhide is always a special time of year and there’s always a bit of lag between that and us adding the version and repository. If that still happens after, say, the Bodhi activation point, it’s a bigger problem.
Please, I created #131 for this, let's discuss it there.
Metadata Update from @catanzaro: - Issue tagged with: meeting-request
Metadata Update from @chrismurphy: - Issue untagged with: meeting-request - Issue tagged with: meeting
Discussed at today's meeting. Ernestas will provide us with occasional progress updates in this issue, and we'll revisit a few months from now.
Metadata Update from @catanzaro: - Issue untagged with: meeting
ABRT deletes core dumps far too aggressively, even while user is trying to report the crash
https://github.com/abrt/abrt/pull/1481 should address this. https://github.com/abrt/abrt/issues/1475, if (properly) implemented, should prevent the issue from happening altogether. Might aim for F33 with the latter.
Retrace server should not fail
In the end, we’ve decided to wait until we get the new hardware to redeploy everything on RHEL 8. With the amount of crud on the current server, we cannot guarantee that we would be able to replicate the deployment and be able to roll back.
I don’t have the dates, but it’s still “soon”.
Bug 1878317 - doesn't offer option to process stack traces locally is now a Fedora 33 beta blocker.
The decision to classify this bug as an "AcceptedBlocker" was made on the grounds that it "hinders execution of required Beta test plans or dramatically reduces test coverage"...
Metadata Update from @chrismurphy: - Issue tagged with: meeting
There are two more. Barring some miracle, we will almost surely slip another week.
Metadata Update from @chrismurphy: - Issue untagged with: meeting
Adding the F34 milestone, mostly to ensure that we continue to track this.
Metadata Update from @aday: - Issue set to the milestone: Fedora 34
Metadata Update from @catanzaro: - Issue tagged with: meeting
Discussed again at today's WG meeting.
Neal and Michael will further prioritize the existing list of priority issues in the first comment. It currently lists eight major issues, but we should further prioritize this so we have just a few top-priority issues.
We've split the "major issues" list in the first comment into two halves, "highest-priority" and "other major issues." The highest-priority list includes the three bullet points we found most important. It's actually five issues, but two of those are closely-related to other issues.
Ernestas has left Red Hat. Tomas will invite Miroslav Suchy, head of ABRT team, to a future Working Group meeting.
He'll attend next week, so I'll put this topic first on the agenda for next week.
Kevin mentioned the bugzilla dashboard which I didn't know about.
https://bugzilla.redhat.com/page.cgi?id=productdashboard.html&tab=summary&product=Fedora&bug_status=open&assignee_table_length=25
As for QA there isn't a systematic way of using bugzilla for learning about bugs or problem areas. In no particular order problem areas are uncovered by: blocker bug app, openqa testing, and the various compose reports that get sent to the devel@ list.
@adamwill
Well, also significant in that context are human mailing list posts (and IRC discussion, forum discussion, etc). We do follow those forums and jump on any problems that seem to need jumping on. Major issues tend to 'bubble up' from Bugzilla to the mailing lists especially. Crash reports are often useful for figuring out what is going on with a bug at that point, or at least identifying duplicates and getting a sense of the scale of the problem.
We do haphazardly use the Bugzilla reports too. For instance a year or two back I just decided to try and triage all gnome-shell bug reports, and I did a bunch of work digging through abrt crash reports there (which made up a large chunk of all the bug reports). Some of them did turn out to be quite significant bugs we could isolate and fix, IIRC.
There is one kind of systematic GNOME-specific issue here, which is that the backtraces abrt produces often aren't much use in diagnosing crashes in the Shell because the bug really happened "somewhere else", e.g. in the javascript code. But even then, abrt does attach a snippet of the system logs from around the time of the crash, which can help.
Metadata Update from @catanzaro: - Issue untagged with: meeting - Issue tagged with: meeting-request
Discussed at today's WG meeting.
Other high-priority and major issues:
@msuchy, would January 5 work OK for the next WG meeting? (If not, we can use a later date.)
@catanzaro I prefer date week later. On 5th I have planned PTO - it may or not happen. It depends on current local pandemic situation. 12th is definitely ok with me.
12th is booked too, so let's plan for the 17th.
Hi @msuchy, I guess you still want to attend? Does the 17th sound good?
I understand you are no longer managing ABRT team anymore. Perhaps we should invite the new manager as well? Who would that be? (Of course, the whole team is welcome to attend.)
I can attend. I will also notify the new tech lead who is @msrb
I will attend as well ;)
Metadata Update from @catanzaro: - Issue untagged with: meeting-request - Issue tagged with: meeting
ACTION: Michael to follow-up on Try to reduce false-positives when flagging sensitive words and Possible sensitive data detected is almost always a false positive
Discussed at today's meeting. ABRT developers accepted some of my suggestions in Possible sensitive data detected is almost always a false positive.
ACTION: MIchael to schedule ABRT is reporting too many duplicates, even when it detects potential duplicates for discussion in a January WG meeting
Partially discussed today, though we were short on time. Miroslav has posted a short summary in the issue.
Metadata Update from @catanzaro: - Issue set to the milestone: None (was: Fedora 34)
Current status is retrace jobs seem to always fail. Bug was reported in October last year, but it's still broken.
I can confirm https://github.com/abrt/retrace-server/issues/428
We discussed this issue during today's WG call, and there's a desire to somehow resolve it during the F37 cycle.
Obvious options that are available to us:
Next steps:
We should aim to talk about this again in no less than a month.
Metadata Update from @aday: - Issue set to the milestone: Fedora 37 - Issue tagged with: pending-action
Hi everyone!
Jens reached out to us (abrt) via e-mail and I'd like to shed some light on the progress here.
We tried to address this issue in libreport 2.15.1 (released 04/2021) and it should be available in all f34+ releases. Do people still experience this problem?
Yes, this is a big one. Although Matej G. fixed a ton of problems in retrace server last year, it just randomly keeps breaking. We have monitoring for the service that shows that the failure rate is usually on the lower-ish side (~5%), but from time to time there is a day when the instance is completely broken and everything fails.
Since debuginfod is available by default in f35+, we decided that we will switch to that for backtrace generation instead of retrace server.
There is already a WIP PR with the change here: https://github.com/abrt/abrt/pull/1600
The change is planned for this quarter.
We patched libreport in all f34+ releases so it should now require API key instead of username+password. However, if we are talking about existing images that bundle older version of libreport, then there is unfortunately nothing the ABRT team (or anybody) can do -- it's technically not possible to update the library there. New f36 images should already include the patched version and I'd hope that for older releases people already reported the majority of problems that they encountered during installation and thus we shouldn't be missing much (?)
There is one big area where we unfortunately did not make any direct progress last year -- abrt reporting too many duplicates. We started modernizing the bugzilla reporting codebase that will allow us to implement changes around Bugzilla workflow much quicker. This effort should also finish this quarter and then we will try to address the problem with too many duplicates.
Please let me/us know if you have questions or suggestions.
Thanks!
sensitive words like "key" make many reports' data private/useless We tried to address this issue in libreport 2.15.1 (released 04/2021) and it should be available in all f34+ releases. Do people still experience this problem?
I experience it quite often. Some environment variables trigger it, even though they could be easily ignored. I used to report some false positives in the past, but then gave up. Should I continue to file those?
Thanks for the update, @msaju ! It's great to hear that progress is being made on these issues.
The working group discussed this again yesterday and is looking forward to testing any improvements. We also discussed the possibility of updating the gnome-abrt UI. I'd be happy to work on updated designs, if that would help.
@kparal We keep trying to improve the user experience so please do report any false positives you encounter.
@aday I'm glad to hear that. We would be happy to cooperate on improving the UI as well.
Another one that would be good to fix: https://bugzilla.redhat.com/show_bug.cgi?id=1120859
@mgrabovs - I've created an updated set of gnome-abrt mockups. These are just a start and will need plenty of discussion and collaboration, so just let me know if and when you'd like to make any changes in this direction.
Metadata Update from @aday: - Issue untagged with: pending-action
Thank you so much for the mockups, @aday. They look great. We'll get in touch once we're ready for the transition.
Metadata Update from @aday: - Issue tagged with: qa
We briefly discussed this issue during today's WG call.
We agreed that the reports from ABRT can often include useful information. Unfortunately the ongoing issues with private reports and duplicates mean that they're difficult to digest. Since this issue has been going on for so long, we are reluctant to continue to let it slide.
We've therefore agreed to propose a PR to remove the privacy filtering from ABRT. @mclasen has volunteered to try and find someone to do that.
Metadata Update from @aday: - Issue assigned to mclasen - Issue tagged with: pending-action
As pointed by @msrb in https://pagure.io/fedora-workstation/issue/130#comment-791156 the problematic keywords (as reported in https://github.com/abrt/abrt/issues/1399) were fixed and are already part of all stable Fedora releases.
I just received a private bug report yesterday (here so the private reports are still a problem. I'd like the functionality to report private bugs removed altogether, or at minimum, hidden so that it's not easy to do via the ABRT UI.
This seems dangerous and I'd be concerned about it. We absolutely have had real cases of people submitting bug reports that contained their actual passwords.
I'm not suggesting that we remove the function to highlight possible sensitive data. But the private bug reports have got to stop.
Oh, okay, yeah, if we provide a viable alternative way to deal with possibly-sensitive data I'd be less concerned.
It seems that we are waiting on finding a volunteer to take this forward.
We discussed this issue at today's WG meeting. The WG feels that the data we get from abrt is valuable, but there are concerns about the maintenance level.
We plan on talking to the relevant individuals about this, and then on possibly opening up a wider discussion on the subject.
The switch to debuginfod will likely affect the issues being discussed here, so we're going to need to try that and then reassess. Is there an ETA on when it will be released, or when we'll be able to test it?
Hello there. I apologize for the delay -- I got swamped with other work. The update is incoming on Monday ;)
Here we go: The update with debuginfod-enabled ABRT has been pushed to Rawhide: https://bodhi.fedoraproject.org/updates/FEDORA-2022-bedb590873
I've also prepared COPR with the same update for f37: https://copr.fedorainfracloud.org/coprs/g/abrt/abrt-debuginfod/
I can try to build it for f36 if that would help with testing.
Please let us know if you encounter any issues. Thanks! ;)
After installing the packages from the copr and rebooting my computer, I open gnome-chess and then run killall -SEGV gnome-chess to trigger a fake crash. Then I try to Report the crash using ABRT. It fails:
killall -SEGV gnome-chess
--- Running report_uReport --- Server responded with an error: 'List element is invalid: Element 'version' is invalid: String '20201206^1.git0c78c8329' does not match the pattern '^[a-zA-Z0-9_\.\+~]+$'' ('report_uReport' exited with 1)
<img alt="Screenshot_from_2022-11-08_10-17-10.png" src="/fedora-workstation/issue/raw/files/d2da54e5b201fd727d3947dd3019750e2f9f0dbe1fd808e8520c312b24c0f7ff-Screenshot_from_2022-11-08_10-17-10.png" />
So that's not good. I was able to successfully "report" a nautilus crash though, which wound up taking me to https://bugzilla.redhat.com/show_bug.cgi?id=2129705, although it discarded my comment presumably because I didn't write anything, so nothing really happened. That's a strange user experience.
Will continue to test and see how it works.
Experience just now with the debuginfod version of ABRT: the app correctly showed a notification for an Inkscape crash. When I tried to report it, it failed to generate the report with Error: No segments found in coredump './coredump'.
Error: No segments found in coredump './coredump'
I think I've seen this same error once before with abrt from the copr. I haven't been able to successfully report an issue with it yet.
@msrb fyi: gnome-abrt has been orphaned and will be retired if nobody takes ownership. Should probably be owned by the abrt-team account.
I'm seeing the same error message (which prevents me from reporting any issues through GNOME abrt) and have reported it to the retrace server: https://github.com/abrt/retrace-server/issues/484
Is there a workaround until the bug gets fixed?
So I noticed that my Epiphany crashes are still being detected as kernel failures due to this bug that has been on the list for a long time. Unfortunately ABRT developers have not made significant progress on the list of issues in the first comment. And we don't have manpower to assist ABRT team with improving ABRT at this time.
Proposal: remove ABRT it for Fedora 39 and aim to bring it back in the future.
Metadata Update from @catanzaro: - Issue untagged with: qa - Issue assigned to catanzaro (was: mclasen)
Hi @msuchy @msrb , the Workstation WG is concerned about slow progress on the issues identified in the first comment. We have two parallel proposals:
Any thoughts on this?
It might be possible to close some of the existing tickets that we're tracking that relate to the retrace server, if that is no longer used.
I no longer work in the ABRT team. This is at @msrb discretion now
Oops, sorry, I knew that. I meant to ping @mgrabovs.
Hi @catanzaro
I like the plan. There is usually at least one bugfix release every quarter, so things should be, hopefully, going in the right direction. Although slowly... :/
It would be amazing if you could find somebody to work on the UI part. I was thinking whether we could try to find a contributor via GSoC to work on that, but that opportunity has passed for this year.
Fedora 38 is the first release which uses debuginfod instead of retrace server by default. The same update has been submitted for F37 as well, so it leaves us with just F36 (EOL'ed soon) that still relies on the retrace server.
Workstation WG or ABRT team to identify 2-3 issues per quarter from the first comment for ABRT team to attempt to resolve, to slowly reduce the count of issues we are tracking. I would include this one in the first round as it's particularly embarrassing.
OK, so sounds like this is a plan. We'll check back in July (Q3).
Metadata Update from @aday: - Issue set to the milestone: Fedora 39 (was: Fedora 37)
OK, so sounds like this is a plan. We'll check back in July (Q3). ABRT team to work on that issue 1386 and 1 or 2 other issues from this list of "highest-priority issues" or "other major issue" Desktop team to figure out how to best help with ABRT UI
Reminder: we'll check back in early July for a progress update on these issues.
Hi @aday and @msrb, would Tuesday, July 11 be a good date for ABRT follow-up discussion? Workstation WG meetings are Tuesdays at 10:00 AM EDT (14:00 UTC). (We could do a separate meeting at a different time if that time slot doesn't work well for you, but that's the time where we can have the full Working Group present.)
Metadata Update from @catanzaro: - Issue untagged with: pending-action
@catanzaro Tuesday, July 11 works for me ;)
[Edit - I commented in the wrong ticket. Please ignore.]
Metadata Update from @aday: - Issue untagged with: meeting-request - Issue set to the milestone: None (was: Fedora 39)
Apologies for the previous comment - I picked the wrong tab by mistake.
We discussed this issue on 11 July. Some fixes have been made from @catanzaro 's list, and it is hoped that these will improve the situation. The plan is to continue trying to make progress on a quarterly basis.
The ABRT team have told us that they don't have the time or expertise to work on the gnome-abrt UI, so I've created a separate ticket to track what to do about that - that's #386.
gnome-abrt
The plan is to continue trying to make progress on a quarterly basis.
Let's plan to have another meeting with ABRT developers in October for another progress update.
Small status update: After upgrading to Fedora 39, which is unsurprisingly not as stable as Fedora 38, and using the new ABRT for several days to report many crashes, I'm pretty confident that the debuginfod integration has dramatically improved both the speed and the reliability of ABRT. Nice.
Ironically, for me, debuginfod is what made me mostly stop using abrt - I just backtrace things with coredumpctl or gdb directly now, since gdb has debuginfod integration. the main benefit of abrt for me was the retrace server, heh.
Hi @msrb, seems we all forgot about October progress update. How does November sound? Are you able to attend another Workstation WG meeting?
Hi @msrb, any update?
Hey, I apologize for the radio silence here lately. I did not have much time to look into these things. However, a new release is coming to Fedora this week :blush:
Metadata Update from @ngompa: - Issue tagged with: default-apps, experience
I suggest it's time to take a pragmatic approach here. We keep ABRT and automatic problem reporting, but remove GNOME ABRT from the default install. Users who want the user interface for reviewing problems and reporting to Bugzilla will have to install it manually.
It would be nice to keep the GNOME ABRT tool, but it would require a significant redesign and major functionality enhancements (at minimum support for Flatpaks and for reporting problems directly to upstream GNOME GItLab rather than Red Hat Bugzilla) and we don't have contributors interested on working on these currently. This ticket has been open for four years now and further progress is unlikely. We should add it to our list of "nice to do, if we had more contributors."
I suggest it's time to take a pragmatic approach here. We keep ABRT and automatic problem reporting, but remove GNOME ABRT from the default install.
+1 from me. I was hoping to maybe get somebody to work on this as part of the GSoC, but Fedora project wasn't selected this year.
We do not have consensus to remove the ABRT UI.
Action items:
This issue has been open for 4 years now. I fear there's a strong chance it will still be open 4 years later if Red Hat doesn't fund more work on ABRT and no new volunteers emerge. But we all agree that crash reporting is very important, so the Working Group's reluctance to remove the app is understandable.
Metadata Update from @catanzaro: - Assignee reset - Issue untagged with: meeting
I think the potential middle ground could be to keep the UI enabled pre-GA. Some people (including me) upgrade to the upcoming release once the Beta is out, and sometimes even sooner. The assumption here is that most blatant crashes will be caught and (easily) reported by people who upgrade and test stuff early, before the final release is out.
I think the potential middle ground could be to keep the UI enabled pre-GA.
How would we do this, though? We would have to modify the desktop file to hide the application shortly before final freeze. But then it's gone for everybody and you have to manually unhide it if you want it. Not sure that's a good idea.
I asked @sshil to start looking at this with a view to initially moving gnome-abrt to gtk4/libadwaita. Don't want to promise too much, but let's see how far we can get for F41 and later.
That's great. Note there are mockups here.
Log in to comment on this ticket.