https://discourse2fedmsg.fedoraproject.org/webhook is failing with
error: Net::ReadTimeout with #<TCPSocket:(closed)>
after what seems to be a 20-second timeout every time. Last time this happened, I think the problem was (wait for it!) ... DNS.
https://discussion.fedoraproject.org/admin/api/web_hooks/2
This has been failing for at least a month -- unfortunately it doesn't alert on failure, and it's easy to not notice something not happening....
Metadata Update from @kevin: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: medium-gain, medium-trouble, ops
So, Its not dns. ;)
(famous last words).
I re-ran the playbook and got it to roll out a new build and now it's returning 200! However, I am not sure it's actually working.
It has:
Unhandled error in Deferred: fedora_messaging.exceptions.ConnectionException
and
fedora_messaging.exceptions.ConnectionException Traceback (most recent call last): --- <exception caught here> --- File "/opt/app-root/lib64/python3.9/site-packages/fedora_messaging/api.py", line 262, in _twisted_publish yield _twisted_service._service.factory.publish(message, exchange=exchange) File "/opt/app-root/lib64/python3.9/site-packages/fedora_messaging/twisted/factory.py", line 238, in publish protocol = yield self.when_connected() File "/opt/app-root/lib64/python3.9/site-packages/fedora_messaging/twisted/factory.py", line 203, in when_connected yield self._client_deferred fedora_messaging.exceptions.ConnectionException: [2024-04-09 20:56:45,921] WARNING in webhook: Error sending message 7b5b566e-4b26-46ef-b182-82eff9879e44:
[2024-04-09 21:02:05,565] ERROR in app: Exception on /webhook [POST] Traceback (most recent call last): File "/opt/app-root/lib64/python3.9/site-packages/fedora_messaging/api.py", line 316, in publish eventual_result.wait(timeout=timeout) File "/opt/app-root/lib64/python3.9/site-packages/crochet/_eventloop.py", line 196, in wait result = self._result(timeout) File "/opt/app-root/lib64/python3.9/site-packages/crochet/_eventloop.py", line 175, in _result raise TimeoutError() crochet._eventloop.TimeoutError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/app-root/lib64/python3.9/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() File "/opt/app-root/lib64/python3.9/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) File "/opt/app-root/lib64/python3.9/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) File "/opt/app-root/lib64/python3.9/site-packages/flask/_compat.py", line 39, in reraise raise value File "/opt/app-root/lib64/python3.9/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/opt/app-root/lib64/python3.9/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/opt/app-root/src/discourse2fedmsg/views/webhook.py", line 67, in webhook publish(msg) File "/opt/app-root/lib64/python3.9/site-packages/fedora_messaging/api.py", line 324, in publish raise wrapper fedora_messaging.exceptions.PublishTimeout: Publishing timed out after waiting 30 seconds.
Perhaps @abompard or @zlopez could take a closer look?
The RabbitMQ cert for this use has expired on Feb 13 2024... :-/ I'll renew it.
Renewed and redeployed. I'm looking at the logs, there are some calls to the webhook from discord without errors, and we can see the messages in datanommer.
I think the issue is fixed, please reopen if it's not.
Metadata Update from @abompard: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.