Our monitoring is seeing timeout on koji for around 8 hours now.
The problem can be seen on this downstream prometheus: https://url.corp.redhat.com/koji-outage-2021-03-04
When trying to reproduce manually, koji does not load sometimes in our browsers also.
Also the builders (vast majority of them) stopped responding (or checking in into the hub) at cca 6:30 UTC, so new builds can't be processed.
nagios was complaining about koji02, I've restarted apache there and nagios seems happy/happier now.
Do you see a difference on your side?
Yes, things are looking good now, both the web ui / hub responds now and also builders are updating their status.
Issue status updated to: Open (was: Closed)
Cool, we'll keep on eye on it.
Thanks for confirming. I'll close the ticket, feel free to re-open if you see it again :)
Metadata Update from @pingou: - Issue assigned to pingou - Issue close_status updated to: Fixed - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue status updated to: Closed (was: Open) - Issue tagged with: high-gain, low-trouble, ops
Metadata Update from @pingou: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Issue status updated to: Closed (was: Open) Issue close_status updated to: Fixed
Log in to comment on this ticket.