#10632 rhel9 adoption
Opened 2 years ago by kevin. Modified 2 months ago

rhel9 isn't out yet, but it will be.

We should do some prep work with centos-stream-9 and make sure things are mostly in order for us, and then when rhel9 is available, we should move all our el instances over to it.

Currently there's a mix of rhel7 and rhel8. We should really try and retire all those in favor of rhel9.


Metadata Update from @aheath1992:
- Issue assigned to aheath1992

2 years ago

I can help spear head this initiative, I already need to learn RHEL 9 so I can help see any real world issues and help resolve.

Metadata Update from @zlopez:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: high-gain, high-trouble, ops

2 years ago

I created a hackmd list where we can keep track of packages that are or not in epel 9. anyone should be able to update this list as needed.

https://hackmd.io/@aheath1992/Hyiw9ZPrc/edit

I noticed the epel9 list is empty for now, but when stuff starts showing up, here is the process to follow to request them.

https://docs.fedoraproject.org/en-US/epel/epel-package-request/

Any help welcome! The last time i had time for this, we're stuck getting mailman and Hyperkitty because some key Python dependencies can't be upgraded because other dependents are not ready. We should probably just provide a compatibility package that conflicts with the latest, and upgrade them anyway

Any help welcome! The last time i had time for this, we're stuck getting mailman and Hyperkitty because some key Python dependencies can't be upgraded because other dependents are not ready. We should probably just provide a compatibility package that conflicts with the latest, and upgrade them anyway

just FYI, mailman3, postorius and hyperkitty are all FTBFS in Fedora and are going to be retired soon. :(

Here's my thoughts on rhel9 upgrades. I think I'll post this to the list too.

We have 188 RHEL7 or RHEL8 instances (counting both vm's and bare hardware).

Some of them I think we can reinstall anytime:

backup01.iad2.fedoraproject.org
batcave13.iad2.fedoraproject.org
cloud-noc-os01.rdu-cc.fedoraproject.org
dl01.iad2.fedoraproject.org
dl02.iad2.fedoraproject.org
dl03.iad2.fedoraproject.org
dl04.iad2.fedoraproject.org
dl05.iad2.fedoraproject.org
download-ib01.fedoraproject.org
download-rdu-cc01.fedoraproject.org
dedicationsolutions01.iad2.fedoraproject.org
ibiblio01.iad2.fedoraproject.org
ibiblio05.iad2.fedoraproject.org
memcached01.iad2.fedoraproject.org
noc01.iad2.fedoraproject.org
noc02.fedoraproject.org
ns01.iad2.fedoraproject.org
ns02.iad2.fedoraproject.org
ns02.fedoraproject.org
ns13.rdu2.fedoraproject.org
osuosl01.fedoraproject.org
secondary01.iad2.fedoraproject.org
smtp-mm-ib01.fedoraproject.org
smtp-mm-osuosl01.fedoraproject.org
smtp-mm-cc-rdu01.fedoraproject.org
storinator01.rdu-cc.fedoraproject.org
sundries01.iad2.fedoraproject.org
sundries02.iad2.fedoraproject.org
tang01.iad2.fedoraproject.org
tang02.iad2.fedoraproject.org
torrent02.fedoraproject.org
unbound-cc-rdu01.fedoraproject.org
unbound-cc-rdu01.fedoraproject.org

Some of them we can do, but we will need an outage for them:

bastion01.iad2.fedoraproject.org
bastion02.iad2.fedoraproject.org
bastion13.iad2.fedoraproject.org
people02.fedoraproject.org
db01.iad2.fedoraproject.org
db03.iad2.fedoraproject.org
db-datanommer.iad2.fedoraproject.org
db-koji.iad2.fedoraproject.org
db-openq.iad2.fedoraproject.org
(we can see how long the upgrade takes in stg on the various db servers)

Some of them need applications/packages built for rhel9 and we can't do them until
that is sorted out:

badges (hopefully now ongoing?)
notifs
(ongoing)
mm- (is mirrormanager2 ready to branch/build for rhel9?)
pagure
(how about pagure?)
pkgs (also need pagure)
sign
(needs the new sigul. I'll ping patrick about it again)
value* - needs limnoria in epel9 and fedmsg-irc replacement somehow. The zodbot part perhaps could be moved to openshift.

Some we need to carefully save local data and restore after reinstall:

batcave01.iad2.fedoraproject.org
log01.iad2.fedoraproject.org

The vmhosts need some investigation to see if we can reinstall and keep the
vm data around. Otherwise we need to migrate vm's off and back on.
I tried this in a early 9.0 install and it didn't seem to work on the host
I was testing with, but should try again.

qvmhost
bvmhost

vmhost*

Some, we need to talk about:

db-fas01 - this used to have the fas db on it to be more secure (seperate from db01),
now all it has is the ipsilon db. I guess we could rename it db-ipsilon01?
or fold it into db01?
ipa - I think, but want to confirm we can just reinstall one replica with rhel9, get it
all synced and then do another and another until the entire cluster is rhel9.
rabbitmq
- currently we are using rhel8 + openstack 16 repo. I suppose we can just go to
rhel9 + openstack 17 repo.

Some will just not move anytime soon:

busgateway - needs fedmsg, only in epel7
fedimg
- needs fedmsg. will be replaced by some other solution
github2fedmsg - needs fedmsg, only in epel7
mailman01 - needs fedmsg
nuancer - needs fedmsg
osbs - needs new container build system
pdc
- needs to be retired

Finally, some I am not sure about and would like input:

mbs - is this ready for rhel9? Should we move to Fedora instead?
odcs
- how about this?

So, as far as help goes, getting mirrormanager2 and pagure all in epel9 would be great... coming up with a fedmsg-irc replacement, and getting limnoria in epel9 seem to be the best ways right now.
I can start on the easier reinstalls.

I got our setup ready to install RHEL 9.1 today.

I reinstalled vmhost-x86-08.stg today, without deleting guest disks. It worked, but rhel9.1 install will not allow /boot and /boot/efi to be raid1, which apparently we had in 8.x.

So, we need to see if there's a workaround for that or if there's some other way we can preserve those in case of disk failure.

Going from other chats, this is a change in 9.1 from 9.0 and is not work-aroundable. The EFI raid was always a hack because EFI-FAT isn't really raidable. My limited and probably flawed understanding is that there were multiple issues with various firmware and filesystem flaws.

I think the only way to do this is to break the EFI raid and just use sdaN, then have a 'cron' job which regularly copies its contents to other FAT partitions on sd{b->j}N.

Yeah, understandable... but unfortunate. Will ponder on a workaround.

@dherrera is now working on getting MBS to EPEL9 and I'm helping you with fedmsg.

Regarding running MBS on EPEL9, I created an updated diagram to show what dependencies are missing [0].

Besides getting stuff on EPEL9 (around 30 packages left), some packages are dependent on python-mock, which was deprecated since it was integrated into python 3.6 [1].

Also, the latest version of MBS is still depending on fedmsg. If it get's ported to fedora-messaging[2] it would mean less things to package since fedora-messaging is practically already in EPEL9 (just needs 1 package, and is already being addressed [3]). There is already an open ticket asking to port this [4], but the last update was about a year ago.

[0] MBS on EPEL9 - Required packages breakdown
[1] https://fedoraproject.org/wiki/Changes/DeprecatePythonMock
[2] https://pagure.io/fedora-infrastructure/issue/8213
[3] https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2023-fdfb8a6342
[4] https://pagure.io/fm-orchestrator/issue/1234

A status update:

I plan to do a bunch of hosts this week around outages... vmhosts/bvmhosts and database servers.

Will update the fill list after this week.

Update on MBS on EPEL9:
I was refreshing the list this week and noticed that there are some things I listed before that are already packaged on EPEL9 or EL9 besides the things I have pushed myself, the mayor influential packages I found were python-alembic, python-flask and python-click.

I checked how this packages affects things, so this time I listed the package version that can get packaged. Some of the missing dependencies won't work on their latest rawhide versions since they are not compatible with the already packaged libraries.

  • python-flask-migrate: rawhide version requires a python-alembic version higher than the one on EL9, but f37 version works fine.
  • python-flask-sqlalchemy: rawhide version requires a python-flask version higher than the one on EPEL9, but f38 version works fine.
  • python-celery: rawhide version requires a python-click version higher than the one on EPEL9, the version on f36 works fine.
  • python-billiard: the version on rawhide is too modern for f36 python-celery, so f36 version is also needed for this package to work.

The python-celery case is a bit more complex because even the f36 version requires a version of pytz that is more modern than the one packaged in EL9. Patching it (or another version) to be able to work with less updated libraries might be an option, but I feel we can wait to bring this up when more of it's dependencies end up being packaged.

This time I separated this into 2 diagrams because python-celery has lots of dependencies not related to the other missing ones and it's an easy spot to cut the tree, I also marked runtime dependencies and reorganized a bit to improve readability.

[0] MBS on EPEL9 - Required packages breakdown
[1] python-celery on EPEL9 - Required packages breakdown

pagure (how about pagure?)
pkgs (also need pagure)

@wombelix and I are working on getting Pagure 6.0 ready, which will be the first version that we can ship for RHEL 9. We're doing a lot of cleanup to improve maintainability of the code and make it easier to contribute to.

There's going to be a talk at Flock about it, too! :)

@ngompa Any progress on Pagure for RHEL9?

@ngompa Any progress on Pagure for RHEL9?

The work @wombelix and I are doing to get to Pagure 6.0 is a pre-requisite for this. As Pagure instances are currently on RHEL 8, I'm more focused on working with @salimma on the Mailman stack for EPEL 9.

Status update:

We have a total of 448 instances in ansible

f39 - 39
f38 - 210
f37 - 13
f36 - 1
rhel9 - 70
rhel8 - 59
rhel7 - 46

There's still some I can do pretty easily. A lot of the rhel7 ones are waiting on applications: badges, fedimg, github2fedmsg, mailman, mirrormanager.
Some are waiting to be retired: pdc, osbs. On the rhel8 hosts, some waiting on applications: pagure, sigul. The rest are db and virthosts that are hard to do.

I'm going to try and do the easy ones left soon, then my plan will be to migrate around things so I can do the last of the virthosts without needing outages, and then finally doing an outage to do the database servers. rhel8 still has a lot of life left, but will focus on the rhel7 list.

Here is the up-to-date list of rhel7 servers:

fedocal01.iad2.fedoraproject.org - We need to do something about it
fedocal02.iad2.fedoraproject.org - We need to do something about it

I thought fedocal was deployed through OpenShift now?

@ngompa Yeah, I found out the host_vars in ansible inventory are not reflecting the real state of things. I need to clean that. And update the list.

I was surprised by Fedocal as well, but got confirmed that those machines are already gone.

Cleaned the host_vars in ansible and here is the final list of RHEL7 machines as of current date:

Login to comment on this ticket.

Metadata
Boards 2
mini-initative Status: Backlog
ops Status: Backlog