The staging fedora-messaging instance doesn't seem to be working properly. openQA is publishing messages to it, and that is logged as working on the openQA end:
Jun 20 10:15:40 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[2488236]: [debug] [pid:2488236] Sending AMQP event: org.fedoraproject.stg.openqa.job.done Jun 20 10:15:40 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[2488236]: [debug] [pid:2488236] AMQP URL: amqps://openqa.stg:@rabbitmq.stg.fedoraproject.org/%2Fpubsub?exchange=amq.topic&cacertfile=%2Fetc%2Ffedora-messaging%2Fstg-cacert.pem&certfile=%2Fetc%2Fpki%2Ffedora-messaging%2Fopenqa.stg-cert.pem&keyfile=%2Fetc%2Fpki%2Ffedora-messaging%2Fopenqa.stg-key.pem Jun 20 10:15:40 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[2488236]: [debug] [pid:2488236] org.fedoraproject.stg.ci.productmd-compose.test.complete published Jun 20 10:15:40 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[2488236]: [debug] [pid:2488236] org.fedoraproject.stg.openqa.job.done published
but no message shows up in https://apps.stg.fedoraproject.org/datagrepper/raw?category=ci&delta=172800 or https://apps.stg.fedoraproject.org/datagrepper/raw?category=openqa&delta=172800 (openQA publishes on both topics), and other consumers subscribed to the same topics aren't firing - for instance, the openQA staging server also runs a consumer that listens out for openqa.job.done messages and files reports to ResultsDB, but this is not firing, note that https://resultsdb.stg.fedoraproject.org/results shows only one result since May 20 (that one I triggered manually). Running journalctl -u fm-consumer@fedora_openqa_resultsdb_reporter.service --since 2023-05-20 | grep "Consuming message" on openqa-lab01.iad2 shows indeed that it just hasn't received any messages since 2023-05-20.
openqa.job.done
journalctl -u fm-consumer@fedora_openqa_resultsdb_reporter.service --since 2023-05-20 | grep "Consuming message"
ASAP, but it's not of especially high importance, it just means we can't be sure whether messaging-related features in anything deployed in stg are working before deploying them to prod.
I restarted the resultsdb-ci-listener, but it didn't seem to be broken/help. It doesn't appear to log very much at all. ;(
I also found the zmq-to-amqp bridge was down, but I don't think that affects this (restarted it)
datanommer appears working (because there are other messages)
Metadata Update from @kevin: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: medium-gain, medium-trouble, ops
resultsdb-ci-listener is for Fedora CI stuff (currently, though I keep meaning to look at adopting it for openQA too). openQA currently sends messages directly from perl code in openQA itself, and those are the log messages of that process happening and apparently succeeding. Is the server dropping the submitted messages, or something?
[root@rabbitmq01 ~][STG]# rabbitmqctl list_topic_permissions --vhost /pubsub | grep -E 'ci|openqa' openqa.stg amq.topic ^org\.fedoraproject\.prod\.(openqa|ci)\..* .* resultsdb.stg_ci_listener amq.topic ^$ .* koschei.stg amq.topic ^org\.fedoraproject\.stg\.(koschei|ci)\..* .*
Somehow openqa stg is geting a prod permission?
ah, looking at commit 4d36f9ed505 it uses 'env_short', but openqa-lab isn't really in staging, so it gets the default 'prod' here.
So, we need this to use prod in prod and 'stg' in lab...
Ah, I see. Well, now I look at that file and look at the underlying roles and understand what they actually do, I think it's kinda...wrong in a lot of ways. I'm going to do a rewrite and have you and @abompard both look at it to make sure I understood better this time.
OK, I made a bit of a mess of things for a while there - there may be bogus queues to clear up on prod and/or stg, sorry - but this is fixed now and the plays hopefully make more sense, they do to me anyway.
Metadata Update from @adamwill: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.