#9579 Getting ACCESS_REFUSED when accessing queue
Closed: Invalid 4 years ago by jsztuka. Opened 4 years ago by jsztuka.


Could you please clarify, what is causing following behaviour

Following were aquired from jenkins logs after start of the instance.

2021-01-12 07:44:49.064+0000 [id=80]    SEVERE  c.r.j.p.c.m.RabbitMQMessagingWorker#subscribe: Eexception raised while subscribing job 'cvp-co-metadata-trigger', retrying in 1 minutes.
com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=403, reply-text=ACCESS_REFUSED - access to queue '7391ebfe7478da526fe6ac48242d965c' in vhost '/public_pubsub' refused for user 'fedora', class-id=50, method-id=10)
    at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:522)
    at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:346)
    at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:182)
    at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:114)
    at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:672)
    at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:48)
    at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:599)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=403, reply-text=ACCESS_REFUSED - access to queue '7391ebfe7478da526fe6ac48242d965c' in vhost '/public_pubsub' refused for user 'fedora', class-id=50, method-id=10)
    at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66)
    at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36)
    at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:502)
    at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:293)
    at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:141)
Caused: java.io.IOException
    at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:129)
    at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:125)
    at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:147)
    at com.rabbitmq.client.impl.ChannelN.queueDeclare(ChannelN.java:968)
    at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.queueDeclare(AutorecoveringChannel.java:333)
    at com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker.subscribe(RabbitMQMessagingWorker.java:78)
    at com.redhat.jenkins.plugins.ci.messaging.JMSMessagingWorker.subscribe(JMSMessagingWorker.java:49)
    at com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker.receive(RabbitMQMessagingWorker.java:158)
    at com.redhat.jenkins.plugins.ci.threads.CITriggerThread.run(CITriggerThread.java:73)
2021-01-12 07:44:49.066+0000 [id=80]    WARNING c.r.j.p.c.m.RabbitMQMessagingWorker#unsubscribe: Exception occurred when closing channel: Unknown consumerTag

cvp-co-metadata-trigger job should be triggered by events(pull_request created, etc.) from upstream repository.

Fedora-messaging provider handles those messages from upstream.

Upstream repo : operator-framework/community-operators

Is is possible to get response before EOD 2021/01/14,
Thanks in advance.



Can you explain more about where this is and what it's trying to do?

Are you just trying to get messages? or also write them?

User 'fedora' sounds like the ro receiving user? but then it cannot make it's own queue's...

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-gain, low-trouble, ops

4 years ago

User 'fedora' sounds like the ro receiving user? but then it cannot make it's own queue's...

They can, in the public_pubsub: https://fedora-messaging.readthedocs.io/en/stable/quick-start.html#fedora-s-public-broker when queues that are UUID and have some restrictions.

The core of the stacktrace seems to be when shutting down after running into:

channel error; protocol method: #method<channel.close>(reply-code=403, reply-text=ACCESS_REFUSED - access to queue '7391ebfe7478da526fe6ac48242d965c' in vhost '/public_pubsub' refused for user 'fedora', class-id=50, method-id=10)

Which certs do you use to connect?

We are using these certs to create keystore and truststore
wget https://raw.githubusercontent.com/fedora-infra/fedora-messaging/stable/configs/fedora-key.pem
wget https://raw.githubusercontent.com/fedora-infra/fedora-messaging/stable/configs/fedora-cert.pem
wget https://raw.githubusercontent.com/fedora-infra/fedora-messaging/stable/configs/cacert.pem
Furthermore testing connection from jenkins ends up with success:
Successfully connected to rabbitmq.fedoraproject.org:5671

Are those the ones shipped in the fedora-messaging RPM?

No I'm not sure if they are the same, I am only getting those from github, is there a way to check that they are the same?

I've just compared those we re getting from github and those from fedora RPM and they are identical.

Can you explain more about where this is and what it's trying to do?

Are you just trying to get messages? or also write them?

User 'fedora' sounds like the ro receiving user? but then it cannot make it's own queue's...

cvp-co-metadata-trigget is job to validate changes that are made in upstream GH - community-operator.
We are only trying to register the messages from fedora-messaging and trigger this job accordingly.

Is this possibly related to https://pagure.io/fedora-infrastructure/issue/9385 ? Are you specifying a queue name ? or ?

For queue name we generate random uuid, which is then set in job cofiguration itself.
And the configuration for the fedora-messaging is present on the global config of jenkins itself, where no queue is specified.
I dont think this issue is related with the problem we are having which is

Caused: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=403, reply-text=ACCESS_REFUSED - access to queue 'f2a6e058cd17395c8a9a6783def50d5b' in vhost '/public_pubsub' refused for user 'fedora', class-id=50, method-id=10)

How can I help to speed things up resolving this issue? Is there anything I can provide you with?

Perhaps we could get @abompard or @jcline to glance in here? Or... failing that perhaps we could setup a time to debug this on irc more interactively?

My 2 minute (read: likely wrong) guess is that the queue name is invalid. The stack trace shows "f2a6e058cd17395c8a9a6783def50d5b" is the queue name, and the docs indicate[0] it should be in the "standard" uuid format as produced from something like uuidgen: "7bacc345-5b6e-4ef1-9489-e58f0e175209". I don't remember, but would guess, that the regex expects the dashes to be there.

[0] https://fedora-messaging.readthedocs.io/en/stable/quick-start.html#getting-connected

@jsztuka does comment help?

If not, perhaps we could schedule a time later this week to debug in a more interactive manner? I'm on the west coast US (PST), but we have some team members in EU that perhaps could help here.

I thought this (wrong UUID format) might be the issue, so I changed it in job configuration in our jenkins master and it caused it to crash. I also tried wrong UUID format on my local minishift and the job was successfully triggered. We have lately upgraded our jenkins version on our master, I will try to simulate this situation again on my minishift to confirm/deny such behaviour.

@jsztuka does comment help?

If not, perhaps we could schedule a time later this week to debug in a more interactive manner? I'm on the west coast US (PST), but we have some team members in EU that perhaps could help here.

This could speed thing much more, I'm all for it. Please feel free to reach me anytime, recommending someone in EU is welcome also.

I'm in Europe. You can ping me on #fedora-admin anytime (if I'm in meetings, I'll get back to you as soon as I can)

I figured out what was causing problems mentioned above.
I setup local minishift instance that consists of same settings as our master.
Modified job config similarly to production with wrong UUID format - job was not trigered by actions from GitHub.
Then I set the UUID in queue in the correct format with dashes and the jenkins master instance crashed on error mentioned above.
We held this information (specific UUID) in manifests, so its taken from there each restart of jenkins instance (and this was causing the issue), when I changed the format of UUID here, everything started to work correctly.
So we need to change the UUID format from :
{{ 999999999999999999999 | random | to_uuid | hash('md5') }}
to
{{ 999999999999999999999 | random | to_uuid }}
Will create those changes to our master instance, will let you know how it went.

Update, after applying changes that worked on local minishift, our master instance is behaving differently. This is the log from master jenkins
2021-02-04 19:37:51 SEVERE com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker subscribe Eexception raised while subscribing job 'cvp-co-metadata-trigger', retrying in 1 minutes. java.io.IOException Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=405, reply-text=RESOURCE_LOCKED - cannot obtain exclusive access to locked queue '16748fb4-4e91-4127-acc8-042d1c963b0c' in vhost '/public_pubsub'. It could be originally declared on another connection or the exclusive property value does not match that of the original..., class-id=50, method-id=10)

But this issue is probably on jms-messaging provider itself.

Update, after applying changes that worked on local minishift, our master instance is behaving differently. This is the log from master jenkins
`
2021-02-04 19:37:51 SEVERE com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker subscribe Eexception raised while subscribing job 'cvp-co-metadata-trigger', retrying in 1 minutes.
java.io.IOException

Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=405, reply-text=RESOURCE_LOCKED - cannot obtain exclusive access to locked queue '16748fb4-4e91-4127-acc8-042d1c963b0c' in vhost '/public_pubsub'. It could be originally declared on another connection or the exclusive property value does not match that of the original..., class-id=50, method-id=10)
`

Queues can be created to be exclusive: https://fedora-messaging.readthedocs.io/en/stable/configuration.html#queues
could it be that this one was created as such? Especially if there are more than
one consumer trying to access it.

@jsztuka did this help? Do you still need help? Do you need us to look at how your queue is set-up server side?

I think this issue can be closed, we find out that UUID format was the main issue. We are just creating solution that would fit our requirements. In case that we run into another issue that is connected to fedora, I will probably add the error message and reopen this issue.

Thanks for your cooperation.

Metadata Update from @jsztuka:
- Issue close_status updated to: Invalid
- Issue status updated to: Closed (was: Open)

4 years ago

Thanks for getting back to us on this and glad you could make it work!

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Done