#10321 flatpak container builds are failing with name resolution error
Closed: Fixed 3 years ago by kevin. Opened 3 years ago by kalev.

https://koji.fedoraproject.org/koji/taskinfo?taskID=78592049 and https://koji.fedoraproject.org/koji/taskinfo?taskID=78612738 both failed with similar name resolution errors:

2021-11-10 07:46:15,736 - atomic_reactor.util - DEBUG - - Curl error (6): Couldn't resolve host name for https://odcs.fedoraproject.org/composes/odcs-20784/compose/Temporary/x86_64/os/repodata/repomd.xml [getaddrinfo() thread failed to start]

/snip/

2021-11-10 07:46:15,740 - atomic_reactor.util - DEBUG - - Curl error (6): Couldn't resolve host name for https://koji.fedoraproject.org/repos/f35-build/4254954/x86_64/repodata/repomd.xml [getaddrinfo() thread failed to start]

This seems to only affect x86_64 and aarch64 builds are fine.


Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

3 years ago

Can you try again now?

Basically I think what happened was that docker got upgraded when I upgraded all the build machines. I downgraded to a version that has the seccomp fix to allow f35+ containers to run correctly on it.

So, I think it's fixed, please test.

Ah, that makes sense. Thanks!

I ran into another issue now though so not sure if it's fixed or not. After resubmitting both of the builds, they both hanged in "open" state without any logs in koji for almost 6 hours.

https://koji.fedoraproject.org/koji/taskinfo?taskID=78680170
https://koji.fedoraproject.org/koji/taskinfo?taskID=78680169

I went ahead and cancelled one of them and resubmitted again, but looks like it's still stuck:

https://koji.fedoraproject.org/koji/taskinfo?taskID=78691228

Got it. I had forgotten to fix the docker service file. ;(

So, the complete fix is:

yum downgrade /root/docker*rpm on osbs-master01, osbs-node01, osbs-node02
then edit /usr/lib/systemd/system/docker.service
and remove 'the seccomp line
then 'systemctl daemon-reload'
then 'systemctl restart docker atomic-openshift-node'

Your builds finished so I think this is now working. If you see any further issues feel free to re-open this or file a new issue. Thanks

Metadata Update from @kevin:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

3 years ago

Metadata Update from @kalev:
- Issue status updated to: Open (was: Closed)

3 years ago

Fixed and put something in place to hopefully avoid it happening again.

  • excluded docker updates in yum.conf on osbs nodes.
  • added package_exclude to make sure it's excluded on running vhost_update playbooks.
  • downgraded and restarted everything.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog