#329 Make the test day results more useful
Closed: Deferred to upstream a year ago by aday. Opened 2 years ago by aday.

At today's working group call, we reviewed the results from the last GNOME test week:

https://testdays.fedoraproject.org/events/138

Attendees were of the opinion that the results could be more useful.

It was difficult to look at the results table and get a clear overview of where the issues actually were.

To make sense of the results, you have to go through the comments, which is time consuming, and the comments are often ambiguous regarding what is happening (ideally clear errors would have bug reports, with accompanying information - logs, screenshots, etc). There was a lot of ambiguity.

Personally, I can see how the table could be worthwhile if it got some improvements, such as removing the space for comments, requiring a link to an issue for each failure, and enforcing a fixed row height.

Alternatively, we could investigate alternatives to the existing test results page. A wiki page for people to simply list issue reports might be worth considering.

@sumantrom @adamwill


So, the intent of the system - at least when I was running it - is that you're not supposed to consume the results direct from testdays; once the test day is supposed to be done, they're meant to be exported from testdays to the wiki page. There's also a tool I have lying around - https://pagure.io/fedora-qa/testdays - which can automatically pull long comments out of the tables and dump them in a references section at the bottom:

./testdays.py modify -l "Test Day:2022-08-15 Fedora 37 GNOME 43 Beta"

I've run that now, and you can see the results at https://fedoraproject.org/wiki/Test_Day:2022-08-15_Fedora_37_GNOME_43_Beta . Admittedly, it's still not great.

I would generally consider it a specific task to go through the feedback and kinda triage it. This is something I would do when I was running test days. Just quickly eyeballing the comments, they fall into buckets:

  1. confused people (this implies insufficiently detailed tests)
  2. people identifying specific errors/inconsistencies in the tests
  3. vague descriptions of things that might be bugs
  4. detailed descriptions of things that definitely seem like bugs (many with links)

1 and 2 are clearly QA's responsibility: @sumantrom , what we need to do there is fix errors in the tests, and make tests which confused testers more detailed.

3 I'd also kinda see as QA's responsibility: @sumantrom , what we can do there is do what we can to sort out those vaguer reports. Sometimes you'll get multiple reports of what looks like the same thing, in which case you can probably synthesize a useful report out of it, or reproduce it yourself and write a useful report. Sometimes it'll just be so vague it might not be worth bothering with. Sometimes you'll only have one or two reports but they'll look 'interesting' enough to be worth trying to duplicate or going back to the reporters to ask for more detail.

4 is obviously the stuff that's most useful for devs to work on. The testdays tool can also distil bug reports out of test day pages (though because I wrote it primarily for summarizing multiple test days together, the syntax looks a bit odd): ./testdays.py stats -s 2022-08-10 -u 2022-08-20 -f GNOME -l - that's "stats from test days between 2022-08-10 and 2022-08-20 with GNOME in the name, with a one-line summary for each bug", i.e. we use the date range and the title filter to zero in on the single page we want to look at. As you can see from the results, right now it only does RH bugs:

Test Day:2022-08-15 Fedora 37 GNOME 43 Beta

#2117710 NEW        - mclasen@redhat.com - Open in Disks link is not working in root directory properties window.
#2119057 CLOSED     - anaconda-maint-list@redhat.com - Installer is opening at first step, after installation completes.
#2119087 NEW        - gnome-sig@lists.fedoraproject.org - debug option not available for gnome-music
#2119089 MODIFIED   - mcrha@redhat.com - Explore page is lacking content by default, has an odd narrow width
#2119094 NEW        - gnome-sig@lists.fedoraproject.org - progress bar is not showing up in gnome music

Testers: 28, Tests: 376, Bugs: 5, Ratio: 0.17857142857142858
Open: 4, Dupe: 1, Fixed: 0, Unfixed: 0, Fixed %: 0.0

I can look at extending it to maybe also cover gitlab reports. For now it's pretty easy to find those in the results by just searching for gitlab.

On the whole, I'd suggest your team for now would want to focus on the five solid RH bug reports and four gitlab reports that are mentioned; I'd see it as our job to triage the other stuff into things that are useful for you, ideally. @sumantrom , does that seem like something you have time to handle, or should we arrange some help with that?

The other thing we can do to try and improve 3 is to improve the instructions on the Test Day page and in the test cases, and the 'live' help during test days; again, this is a task for us (i.e. @sumantrom ). It sounds like we should emphasize harder that we really want folks to submit bug reports for the issues they find, and link to those reports in the descriptions.

Maybe we should have the testdays webapp reject long comments and specifically say "you should file a bug report"? Of course, it shouldn't eat the text - the UI should show the long comment and say "this comment is too long, please file it as a bug report instead", something along those lines. I guess I'll file an issue on testdays for that.

Hmm, looking into it a bit more, the app does warn you on a comment when it reaches 500 chars. I might want to bump that down a bit.

Further thoughts: I think another issue here is that there's a lot of test cases stuffed into the Basic table. That means we get a lot of different comments per user in that table and the space for comments is unavoidably narrow. I think it would help to split those test cases up into multiple tables in future runs.

A lot of the comments definitely seem to boil down to outdated/inaccurate test cases, fixing those issues would help things for the future. There are also quite a lot of comments for a few common bugs, like the GPG key problem on Rawhide at the time of the test week which prevented package installs working. This kind of thing is tricky to deal with, though it can help to post info on it in the IRC channel - ideally we'd fix the bug rapidly and update the test day image links, but getting that done can be hard.

So, the intent of the system - at least when I was running it - is that you're not supposed to consume the results direct from testdays; once the test day is supposed to be done, they're meant to be exported from testdays to the wiki page. There's also a tool I have lying around - https://pagure.io/fedora-qa/testdays - which can automatically pull long comments out of the tables and dump them in a references section at the bottom:

./testdays.py modify -l "Test Day:2022-08-15 Fedora 37 GNOME 43 Beta"

I've run that now, and you can see the results at https://fedoraproject.org/wiki/Test_Day:2022-08-15_Fedora_37_GNOME_43_Beta . Admittedly, it's still not great.

I would generally consider it a specific task to go through the feedback and kinda triage it. This is something I would do when I was running test days. Just quickly eyeballing the comments, they fall into buckets:

  1. confused people (this implies insufficiently detailed tests)
  2. people identifying specific errors/inconsistencies in the tests
  3. vague descriptions of things that might be bugs
  4. detailed descriptions of things that definitely seem like bugs (many with links)

1 and 2 are clearly QA's responsibility: @sumantrom , what we need to do there is fix errors in the tests, and make tests which confused testers more detailed.

3 I'd also kinda see as QA's responsibility: @sumantrom , what we can do there is do what we can to sort out those vaguer reports. Sometimes you'll get multiple reports of what looks like the same thing, in which case you can probably synthesize a useful report out of it, or reproduce it yourself and write a useful report. Sometimes it'll just be so vague it might not be worth bothering with. Sometimes you'll only have one or two reports but they'll look 'interesting' enough to be worth trying to duplicate or going back to the reporters to ask for more detail.

4 is obviously the stuff that's most useful for devs to work on. The testdays tool can also distil bug reports out of test day pages (though because I wrote it primarily for summarizing multiple test days together, the syntax looks a bit odd): ./testdays.py stats -s 2022-08-10 -u 2022-08-20 -f GNOME -l - that's "stats from test days between 2022-08-10 and 2022-08-20 with GNOME in the name, with a one-line summary for each bug", i.e. we use the date range and the title filter to zero in on the single page we want to look at. As you can see from the results, right now it only does RH bugs:

Test Day:2022-08-15 Fedora 37 GNOME 43 Beta

2117710 NEW - mclasen@redhat.com - Open in Disks link is not working in root directory properties window.

2119057 CLOSED - anaconda-maint-list@redhat.com - Installer is opening at first step, after installation completes.

2119087 NEW - gnome-sig@lists.fedoraproject.org - debug option not available for gnome-music

2119089 MODIFIED - mcrha@redhat.com - Explore page is lacking content by default, has an odd narrow width

2119094 NEW - gnome-sig@lists.fedoraproject.org - progress bar is not showing up in gnome music

Testers: 28, Tests: 376, Bugs: 5, Ratio: 0.17857142857142858
Open: 4, Dupe: 1, Fixed: 0, Unfixed: 0, Fixed %: 0.0

I can look at extending it to maybe also cover gitlab reports. For now it's pretty easy to find those in the results by just searching for gitlab.

On the whole, I'd suggest your team for now would want to focus on the five solid RH bug reports and four gitlab reports that are mentioned; I'd see it as our job to triage the other stuff into things that are useful for you, ideally. @sumantrom , does that seem like something you have time to handle, or should we arrange some help with that?

Yep, I am reading through things. Looks like we have a few things that can be done here. I would like to take a stab at this for the Gnome Apps test day(if it happens) and definitely the 43 final test week.

Another thing that I've mentioned during the meeting (and Neal said that he mentioned it in the past as well) it not to use a dedicated IRC channel for the test days, but use the #fedora-workstation, mainly to not confuse people if there are more test days running at the same time (when I joined, people were talking about kernel test day there, so I left) and also to get more people (potential new contributors) in the #fedora-workstation channel.

It sounds like the plan is to continue with the existing results page, with some improvements to the test cases, and follow-up tasks to turn the observations into a useful report.

The next test week is planned for 7 September. I hope that we can make all that happen by then. I'm happy to help improve some of the test cases.

On the results pages, there are currently too many tests in each category. This will make it hard for volunteers to complete some categories.

I'd prefer us to have smaller categories, and to encourage testers to focus on categories with the least results. That way we can try and ensure broader coverage.

We could potentially do the categories like this:

Core desktop

  • Accessibility
  • Initial setup
  • Lock screen
  • Login screen
  • Desktop update notification

Gnome Shell

  • Activities
  • Dash
  • Overview search
  • Workspaces
  • Classic mode

GNOME shell extensions

  • Extensions install
  • Extensions gnome org
  • Extensions remove

Core apps

  • Music
  • Web
  • Files
  • Software
  • Evince

Settings and utilities

  • Online accounts
  • Disk management

It looks like this did get improved: compare the current event page to the previous one. The wiki metadata page edit history is weird so I can't see who to credit, but it's definitely better. :D

The test week for GNOME 43rc ran from 7 September until today, 14 September. The timeline is challenging for this one - the GNOME deadline for changes prior to 43 final is next Saturday, the 17th.

Looking at the results, nothing jumps out at me. If there's going to be a report or a summary, having it sooner rather than later would be good.

These narrow windows in which to do testing between development releases makes me wonder about the viability of:

a) a full week of testing
b) having to wait on a manual summary task at the end of the test period

Perhaps we should limit ourselves to 3 days of testing, and figure out how to make the results immediately meaningful. Alternatively, we could skip the RC test week and instead test the final GNOME release, though there are obvious disadvantages to that approach.

I don't see any really concerning results. A couple of references to already fixed bugs, and some things that look like moderate importance bugs:

  • Some issues with the light theme in GNOME Classic: "Settings should be in the "System Tools" category instead of "Others"; The shell uses the light theme, but the text for the current day of the week in the top bar's calendar is white instead of black , making it unreadable; The lock screen uses dark text on a transparent background for the login input, making it potentially unreadable depending on the wallpaper."
  • "typing password (to unlock screen) on HW keyboard while on-screen keyboard on, the password is always visible (letters in-stead of dots)", also same issue when logging in
  • "There's no separator between running and non-running apps in the dash. Running apps are mixed together with non-running ones." (this may be an intentional design change since the test case was written?)
  • "The separator between pinned and unpinned apps still bugs out sometimes when moving apps in it, making the last pinned app appear after the separator. It's been an issue since GNOME 40."
  • " Once [a GNOME] account is added [to Online Accounts], it is NOT possible to remove the account from Online Accounts list. The app simply hang"
  • "Toggle buttons for bluetooth disappear from the menu when turned off. It appears again when bluetooth is turned on in the settings" (I can confirm this one - turn Bluetooth off from the top right menu and the entry disappears entirely)

Most of the others look like probably blips, tester errors, or test case problems. @sumantrom , can you read through the results and update test cases where appropriate? There are several items of feedback which clearly indicate test cases need updating. Thanks!

The 'disappearing bluetooth' thing is reported upstream at https://gitlab.gnome.org/GNOME/gnome-shell/-/issues/5749 ; apparently it works if you have any devices paired, but not if you don't.

Any ideas for next steps here?

The key question is probably whether QA can commit to processing the test day data into a summary in a timely fashion. If it can, then sticking with the current approach is probably OK. If it can't, I'd be interested in modifying the form and table as described in the original comment.

Metadata Update from @catanzaro:
- Issue tagged with: meeting

a year ago

@sumantrom did you get a chance to look through the results and update the test cases as I asked above?

whether QA can commit to processing the test day data into a summary in a timely fashion

I'm not clear what the summary should look like, can you explain?
Please note that sometimes people participate in the test day and submit results even after the event is over, and so that's one of the reasons why we e.g. convert the testdays results into a wiki section long after it's over (but it's the same data, so I don't see any benefit waiting for that).

I'm not clear what the summary should look like, can you explain?

Sorry for being unclear! The "summary" I'm referring to is the section on the wiki page that adamw describes above - an example.

I see. But that is exactly the same content as available in the testdays app, just converted to a wiki markup. Do you find it considerably easier to read? (I find those two equal - not great, not terrible).

I guess we haven't talked about it much internally in our QA team, but I always assumed that the primary reason for the testdays app->wiki conversion was related to archiving. Our homegrown testdays app is cobbled together from twigs and twine, just because people found wiki syntax difficult and concurrent editing was a frequent problem, and I don't have much confidence that our tool won't fall apart soon. Exporting to wiki is a safer bet to keep the results accessible long-term. And we don't rush with the export, because sometimes the results still trickle in in the following days/weeks after the event, so Sumantro usually exports all events together at the end of the cycle, I think.

That's just a description of how I see it, I might be wrong :-) And of course we can try to make improvements in this area. From what I've read here:
1. fewer test cases per section will improve readability
2. we can try to push people more towards bug reporting and not just writing comments
3. consider using a specific channel like #workstation instead of general #test-day
4. improve test case instructions when feedback shows that they're confused

From the original description:

removing the space for comments, requiring a link to an issue for each failure

I'm afraid this would lead to having fewer test results submitted. People are lazy, and these are volunteers, we can't force them to create bug reports. But we can can insist more.

@kparal the way I remember it, the webapp was always designed as a sort of adjunct to the wiki system, which is still supposed to be the Canonical Thing. it's rather like relval for release validation - it's just supposed to be a tool to make submitting information easier for those misguided people who don't enjoy hand editing wikitext ;)

I think things get blurred a bit by the fact that, in doing this, we sort of wound up with a webapp that almost does the whole job on its own, to the point where which venue "is" the Test Day becomes a bit of an angels-on-pinheads thing; in my head the wiki is still The Thing, but that tells you more about me than anything, I guess ;)

there definitely was a concern for a while that the webapp might just go away at any moment, but perhaps it's been enough years at this point that we don't need to worry about it. I guess things are still a bit awkward because we still kinda rely on the wiki for all the surrounding instructional text for each test day, the webapp can't do that. There's also the issue of the......uh......very high security around the web app's admin interface. But perhaps we could think about enhancing the webapp to address those points and then just dropping use of the wiki for the main test day page and result submission bits, leaving it just as a store for the test cases?

I suppose there's also the fact we do have the other thing called testdays - my CLI client that can parse various info out of the wiki pages: https://pagure.io/fedora-qa/testdays . If we decide to ditch using the wiki it'd be nice to make it so that can work with the webapp.

perhaps in either venue, it may help readability and 'glanceability' if we can hide long comments behind an expander, so you can more easily glance at a larger subset of the results, and if you actually need to read a long comment, you can focus on one at a time. I think it should be possible to do this in both systems, it just needs implementing. I'll take a look at the wiki side if I can get the time.

On the topic of when to export to the wiki - it can of course be done more than once, so we don't really need to wait for late results if that's what we're concerned about. We can do an initial export soon after the event ends and redo it later, there's no technical reason why not, it's just a case of whoever's in charge of the test day remembering to do it.

it may help readability and 'glanceability' if we can hide long comments behind an expander

That's already implemented in the webapp and on wiki you the export tool puts them as footnotes into a separate section below, so I think that's already handled quite fine.

On the topic of when to export to the wiki - it can of course be done more than once, so we don't really need to wait for late results if that's what we're concerned about.

That's good to know. But there's a risk of a human error both from QA (a QA person looks at the wiki, the results are already there, good - but doesn't realize those might not be all the results and doesn't run the export again) and developers (a developer looks at the wiki to read the bugs, but doesn't know those might be just partial results, doesn't also look into the webapp). So I actually think it's working quite OK right now - submit all results to the webapp, and export them for long-term archival to the wiki long after the event is over...

I see. But that is exactly the same content as available in the testdays app, just converted to a wiki markup. Do you find it considerably easier to read?

The wiki is moderately easier to read. Adam also mentioned a review phase where someone from QA would process the results that go on to the wiki, in order to remove noise - which is also what I was referring to.

Maybe it's worth going back to the original report and looking for alternative solutions. The core of the issue is that we didn't find the results we got back from the test day to be very useful. There were lots of tests where the result was unclear, and wading through all those little comments was both time consuming and not very useful. We'd have liked links to issue reports for the tests that clearly failed.

Was discussed in todays Workstation meeting 2023-02-28

Metadata Update from @catanzaro:
- Issue untagged with: meeting

a year ago

Adam also mentioned a review phase where someone from QA would process the results that go on to the wiki, in order to remove noise - which is also what I was referring to.

Yeah, this is basically a service I tried to provide for the more popular test days when I was running them. Really, it's just a case of doing https://pagure.io/fedora-workstation/issue/329#comment-816701 (above).

In the absence of major work on the test day site, some smaller improvements that could help might include:

  • More, clearer statuses for each test: you could have pass/fail/unclear result/issue with test, etc.
  • Instructions to encourage issues to be reported when they are discovered
  • Guidelines for comments: should only include clear, short statements summarising the test result. Detailed descriptions should go in issues. Issues with test cases should be reported. If you have a question about a test, ask in the test day chat.

As discussed during yesterday's working group meeting, this ticket is primarily intended as feedback from the workstation WG to QA. If there are specific tasks to be tracked, it would be better for that to happen on the QA side - on that basis, I think we should close this issue.

Metadata Update from @aday:
- Issue close_status updated to: Deferred to upstream
- Issue status updated to: Closed (was: Open)

a year ago

Thanks Allan. I'm trying to incorporate some of those improvements in the upcoming gnome test days, see https://pagure.io/fedora-qa/issue/718 . I'll be happy to see your feedback in there :-) I'll also try to file some small improvements tickets for the testdays app once I have a bit more free time.

Login to comment on this ticket.

Metadata