#12024 Mailman fedmsg-archiver error
Closed: Fixed 8 months ago by zlopez. Opened 8 months ago by zlopez.

Describe what you would like us to do:


This error started happening when the fedmsg plugin was finally started. It seems that there was maybe some change in internal message schema used by mailman and it needs to be updated according to that.

Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]: Jul 02 14:23:22 2024 (1199283) Traceback (most recent call last):
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:   File "/usr/lib/python3.9/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     result = context.run(gen.send, result)
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:   File "/usr/lib/python3.9/site-packages/fedora_messaging/twisted/factory.py", line 240, in publish
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     yield protocol.publish(message, exchange)
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:   File "/usr/lib/python3.9/site-packages/twisted/internet/defer.py", line 1947, in unwindGenerator
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     return _cancellableInlineCallbacks(gen)
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:   File "/usr/lib/python3.9/site-packages/twisted/internet/defer.py", line 1857, in _cancellableInlineCal>
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     _inlineCallbacks(None, gen, status, _copy_context())
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]: --- <exception caught here> ---
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:   File "/usr/lib/python3.9/site-packages/fedora_messaging/api.py", line 259, in _twisted_publish
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     yield _twisted_service._service.factory.publish(message, exchange=exchange)
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:   File "/usr/lib/python3.9/site-packages/fedora_messaging/twisted/factory.py", line 240, in publish
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     yield protocol.publish(message, exchange)
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:   File "/usr/lib/python3.9/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     result = context.run(gen.send, result)
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:   File "/usr/lib/python3.9/site-packages/fedora_messaging/twisted/protocol.py", line 137, in publish
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     message.validate()
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:   File "/usr/lib/python3.9/site-packages/fedora_messaging/message.py", line 508, in validate
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     jsonschema.validate(self.body, schema)
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:   File "/usr/lib/python3.9/site-packages/jsonschema/validators.py", line 934, in validate
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     raise error
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]: jsonschema.exceptions.ValidationError: None is not of type 'string'
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]: Failed validating 'type' in schema['properties']['url']:
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     {'description': 'Where the message is archived', 'type': 'string'}
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]: On instance['url']:
Jul 02 14:23:22 mailman01.iad2.fedoraproject.org mailman3[1199283]:     None

When do you need this to be done by? (YYYY/MM/DD)


Asap as it's blocking the fedmsg archiver


After watching mailman for a while today, I found out that this happens when the archiver fails with error 502 when reaching http://localhost/archives/api/mailman/urls

I think this is related to high load, which is probably caused by the rebuild of mailman cache from scratch.

I will try to disable the timer that does that for now and see if the issue will still be there. If that will be the case I will create my own script for cache rebuild and hopefully that will be more effective and less hungry.

Even when the job for rebuilding the cache is disabled the errors are still happening. So this needs more investigation.

After some of the changes @kevin did yesterday to the machine I don't see any errors happening anymore. Everything seems to be running as it should.

I have one remaining PR that I will merge and monitor the mailman for a while, but it seems that all the remaining issues were resolved. Now we just need to wait for the index to finally rebuild itself.

Metadata Update from @zlopez:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

8 months ago

Metadata Update from @zlopez:
- Issue untagged with: Needs investigation
- Issue tagged with: medium-trouble

8 months ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog