Planned Outage - Blockerbugs - 2021-05-18 17:00 UTC
There will be an outage starting at 2021-05-18 17:00UTC, which will last approximately 2 hours.
To convert UTC to your local time, take a look at http://fedoraproject.org/wiki/Infrastructure/UTCHowto or run:
date -d '2021-05-18 17:00UTC'
Reason for outage:
Host upgrade from Fedora 32 to Fedora 33.
Affected Services:
https://qa.fedoraproject.org/blockerbugs/
Ticket Link:
https://pagure.io/fedora-infrastructure/issue/9958
Please join #fedora-admin or #fedora-noc on irc.freenode.net or add comments to the ticket for this outage above.
I have proposed a PR for the outage notice for this outage on status.fp.o
https://github.com/fedora-infra/statusfpo/pull/20/files
Once this outage has been confirmed, we can merge this notice, and push the changes live to status.fp.o
Metadata Update from @smooge: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: medium-gain, medium-trouble, ops, outage
The outage notice is now live on https://status.fedoraproject.org/
@frantisekz was this work started / and or completed?
Sorry for the delayed update, the progress is as follows:
blockerbugs production instance has been upgraded to F33 and it is mostly working but we have hit some problems that have yet to be resolved.
Working
Not Working
There are some issues which appear to be auth-related - the cron job we use to do the regular syncs fails immediately with:
pam_sss(crond:account): Access denied for user blockerbugs: 6 (Permission denied) (blockerbugs) PAM ERROR (Permission denied) (blockerbugs) FAILED to authorize user with PAM (Permission denied)
This worked on stg but doesn't work in production and we're really not sure why. The blockerbugs user is a valid fas user in ldap and it should work for this use AFAIK. That being said, the stg instance is using local user and the prod instance is not.
blockerbugs
For the moment, there isn't a whole lot of change going on in the blocker bugs. As long as we remember to run a sync a couple of times per day, it's working "well enough" so long as we get the regular sync working before much longer.
I don't think that the ticket is quite complete yet because we're going to have to rebuild the instance again once we get the whole auth thing figured out.
I think at this point it's probibly best to make a local user for it... since we know that works.
Whats the status here? I went and cleared the outage from status... didn't seem like it would matter to any users that there was still work to do.
The production instance is syncing now that there's a local user. The issue can now be closed
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.