Happened several times last days I had to ping folks on #fedora-noc or #fedora-infra for restarting pagure.io.
The thing is that we can not fill infra issues when pagure.io is down. :-)
But I was curious that, even though all the pages were giving 503 errors, no Nagios reports were on #fedora-noc. Perhaps something is misconfigured?
I looked at the nagios and it seems we are only watching the staging instance of pagure.
Metadata Update from @mohanboddu: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: low-trouble, medium-gain, ops
We are monitoring the prod one as well, it's just called by the internal name: pagure02.fedoraproject.org
Can you share what url(s) you were hitting that were throwing 503s? ie, what exactly should we be monitoring here? perhaps just https://pagure.io/fedora-infrastructure/issues ?
Metadata Update from @zlopez: - Issue priority set to: Waiting on Reporter (was: Waiting on Assignee)
Metadata Update from @aheath1992: - Issue assigned to aheath1992
I have tested on my lab Nagios, and I am able to get the status code for https://pagure.io/fedora-infrastructure/issues and have that monitored. What repo or job to I need to make a Pull Request on to add the changes?
Our ansible repo: https://pagure.io/fedora-infra/ansible
under roles/nagios_server/
possibly look at roles/nagios_server/templates/nagios/services/websites.cfg.j2 for how other sites are checked...
Check has landed and is monitoring https://pagure.io/fedora-infrastructure/issues. The Fedora team should receive alerts if the site is unreachable.
Check location: https://nagios.fedoraproject.org/nagios/cgi-bin//extinfo.cgi?type=2&host=pagure02.fedoraproject.org&service=https%3A%2F%2Fpagure.io%2Ffedora-infrastructure%2Fissues
Metadata Update from @aheath1992: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.