#8127 mbs-frontend02 not sending fedmsg messages to mbs-backend01
Closed: Will Not/Can Not fix 4 years ago by kevin. Opened 4 years ago by mprahl.

Earlier today, it was reported that module builds stayed in the "init" state for a long time. After looking into the issue, it was because mbs-backend01 was not receiving any messages from mbs-frontend01 stating a build was submitted. Any builds submitted against mbs-frontend02 worked just fine. This was also confirmed by comparing this to the messages in datagrepper.

I reported the issue in #fedora-admin and @smooge restarted httpd on mbs-frontend01. That fixed the issue, but at 13:48 UTC, the same issue started occurring on mbs-frontend02, but mbs-frontend01 still works.

Could you please help me troubleshoot the issue?


I looked and could not find any logs about why this is happening. This needs someone with more fedmsg experience. I restarted the httpd process in the meantime

I think this might have been related to the mass update/reboot. If things restart while other things are down fedmsg just gives up and isn't sending until a restart...

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Metadata Update from @mprahl:
- Issue status updated to: Open (was: Closed)

4 years ago

I noticed that backend01 is not always receiving messages from frontend01 or frontend02. I submitted 6 builds just now and the backend should have received a message for each build, but only 3 were received. It doesn't seem to be specific to a particular frontend server either.

Could you please advise what to do here?

It could just be the nature of fedmsg, which is not 100% delivery, but it seems odd to me that so many would be dropped in a row there...

@mprahl Is it the same for cases where all my components finished building but the module build is not finished?

I am waiting hours for this to complete:

https://mbs.fedoraproject.org/module-build-service/2/module-builds/6143

Edit: I'll just try cancelling, resubmitting.

Ah crap I think I just screwed myself. I can't resubmit the build because I already pushed new changes to that branch.

How do I build a module from a specific git hash? For koji and bare RPM builds, I can give a specific commit hash. How do I do the same for MBS?

I need to resubmit this commit: https://src.fedoraproject.org/modules/eclipse/c/bd9c40a5439c125d370511f8eae8369af82b9c39?branch=2019-06 -- because it is a "bootstrap" build.

@mprahl Can you help?

Ah I figured it out.

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: mbs

4 years ago

https://mbs.fedoraproject.org/module-build-service/2/module-builds/616[6-9] builds are stalled. First they were endlessly waiting on finished Koji tasks that had already finished. I canceled them and resubmit again. Now they are in "wait" state. I tried again but still in the "wait" state.

I've restarted fedmsg-hub on backend01. The error was "too many open files", which is not what this particular issues is about.

Is this still happening?

@lucarval I haven't heard about this happening again since my previous comment.

I saw this occur again.

I noticed the following in the frontend logs during some of the failures and it seems suspicious:

[Wed Sep 11 14:21:15.508908 2019] [:error] [pid 14019] DEBUG:fedmsg.core:Trying to bind to tcp://mbs-frontend01.phx2.fedoraproject.org:3000
[Wed Sep 11 14:21:15.509163 2019] [:error] [pid 14019] DEBUG:fedmsg.core:Trying to bind to tcp://mbs-frontend01.phx2.fedoraproject.org:3001
[Wed Sep 11 14:21:15.509329 2019] [:error] [pid 14019] DEBUG:fedmsg.core:Trying to bind to tcp://mbs-frontend01.phx2.fedoraproject.org:3002

I noticed this also seems to happen after all the services are restarted (fedmsg-hub on backend01 and httpd on frontend01 and frontend02). Sometimes if I wait a while, it starts working though.

I wonder if this might be due to inventory/group_vars/mbs_frontend:

# Definining these vars has a number of effects
# 1) mod_wsgi is configured to use the vars for its own setup
# 2) iptables opens enough ports for all threads for fedmsg
# 3) roles/fedmsg/base/ declares enough fedmsg endpoints for all threads
wsgi_fedmsg_service: mbs 
wsgi_procs: 2
wsgi_threads: 2

So, there's 8 fedmsg ports defined, but only 2x2 wsgi, so 1/2 of the ports aren't working? But it's failing the other way right?

But also oddly, on mbs-backend01 I only see port 3000 listening... not the other 8.

Hope that helps some. Will try and ponder on what could be going on. Fixing the frontend number is likely a good idea. Checking why the backend only listens on one port also might be good.

@kevin I haven't had time to look into this yet. Thank you for the pointers.

Metadata Update from @cverna:
- Issue priority set to: Waiting on Reporter (was: Waiting on Assignee)

4 years ago

@kevin we're planning to migrate to fedora-messaging within the next couple of months, so we decided to not invest time in debugging the issue since it was fedmsg-hub specific and I haven't heard any complaints recently.

Any help is appreciated though.

ALright, then I guess lets close this and reopen if it becomes a problem again...

Metadata Update from @kevin:
- Issue close_status updated to: Will Not/Can Not fix
- Issue status updated to: Closed (was: Open)

4 years ago

Log in to comment on this ticket.

Metadata