mailman was upgraded today [1]. I don't know if this is related or not.
I went to moderate the freeipa-users mailing list. I discarded 20 or so messages and that worked fine. There is one which I believe is valid. Clicking on that message brings up a small window that reads "This held message has been lost."
I figure this is worth a look-see in case if affects other lists as well.
This message was received at 4:45EDT which I believe was within the downtime period.
Not exactly a rush but with my customer service hat on I'd rather not wait weeks for this.
[1] https://pagure.io/fedora-infrastructure/issue/12004
Metadata Update from @phsmoura: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: medium-gain, medium-trouble, ops
cc: @zlopez
Metadata Update from @zlopez: - Issue assigned to zlopez
Looking at it, there are three held messages now and two have the same issue. Looking at the postfix log it seems to be processed without issue. And in mailman log I can see that the message was just held. I don't see anything unusual about the message. I'm not sure why it doesn't have content.
This will need more investigation.
It seems that the messages with lost content are happening at the same time as I see instabilities with mailman. From time to time the mailman gets greedy and the oom-killer just kills something. I'm still trying to work on that and find the root issue behind it, some of them were already solved. But from what I see it only happened few times in first few days.
The mailman is now more stable, hopefully it will stay that way. I'm closing this ticket as fixed. Feel free to reopen it if this will happen again.
Metadata Update from @zlopez: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
FWIW there are five moderated messages on freeipa-users from yesterday and today that have lost their message in the same way. Fortunately all are spam.
The last messages without content are before we disabled the search on 12th July. After that I don't see any. It seems that this really helped. The search will be enabled again once the search index is regenerated.
Today the same happened in devel-announce list. I'm reopening the issue as it seems that the OOM issue is not the root cause of this.
Metadata Update from @zlopez: - Issue status updated to: Open (was: Closed)
Found this issue on mailman gitlab, I need to check if we have this patch included in the version we are running.
That one was fixed in 3.3.6 and we are running 3.3.8.
I checked the code deployed on the machine and it really seems that the patch mentioned in the Gitlab issue is in place.
I will try to reproduce the issue on staging and see if this is the same bug.
I tried to reproduce the issue on staging by sending e-mail that needs to be moderated to two mailing lists. Even after approving it on one of the lists it was still OK on the second list.
I will try to dig deeper to find what the messages have in common.
I opened a ticket on mailman issue tracker and will continue to investigate.
I'm investigating the issue together with a mailman dev in the Gitlab ticket.
Thank you, Michal @zlopez
FWIW I have another lost message for freeipa-users with the title "Re: [Freeipa-users] Re: host does not match the primary host name - installing replica". It was sent today, Aug 30.
Another two moderated posts to freeipa-devel and at least one to freeipa-users were lost today.
Still working on that, but it seems that I finally found out what is the issue in this case. I got patch to apply from Mark Sapiro on the gitlab issue, but the lines are completely missing in our deployment. I'm waiting for him to confirm this is really the issue.
I have fix for this ready for deployment and already tested it on staging. Will just wait for the end of freeze.
The fix is now deployed and the issue should be solved. I'm closing this ticket as fixed. Feel free to reopen it if you see this still happening.
Log in to comment on this ticket.