#10340 content lost from wiki without audit trail
Closed: Fixed 2 years ago by kevin. Opened 2 years ago by fche.

Describe what you would like us to do:

Need to investigate why some content disappeared from the wiki, and try to restore it please.

https://fedoraproject.org/wiki/Releases/FeatureBuildId

is one our group happened to find. The page says: "This page was last edited on 2021-11-15, at 13:06:34." in the little grey metadata, now has no content, and the "History" tab shows no recent changes at all. This page is also mum on the subject:

https://fedoraproject.org/wiki/Special:RecentChanges

Something nuked the content, and it's a shame because this was a very valuable page. Maybe it's not the only one affected.


According to archive.org, this page disappeared sometime between June and August 2021.

https://web.archive.org/web/20210621234120/https://fedoraproject.org/wiki/Releases/FeatureBuildId

I suspect the content actually is there, it's just not showing for some reason. ;(

Likely something in the last upgrade.

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

2 years ago

[backlog refinement]
Still needs to be investigated. There were some database tweaks after the latest update, so it's possible that something break.

[backlog refinement]
We didn't have spare cycles to look at this issue, but there will be a new update of wiki soon, so this could be investigated during that.

BTW, it could be enough for community purposes to simply announce that some content was lost, that we would appreciate people's assistance in restoring content if they come across some, that wayback.org appears to have some snapshots. Basically crowdsource the fix.

Please note that the current status of the wiki software actively prevents one from replacing the lost documents from snapshots from archive.org. It's as though it's confused whether there is another in-flight update being committed, or something. But now the URLs seem to be unfixable dead ends.

So, let me explain some more about what is going on here, as I wasn't very clear above. ;(

When moving mediawiki to a new version a while back, the upgrade script wouldn't complete. I looked at it, and the problem seemed to be with the 'actor' (ie, the person who did a commit) for pages that were imported from our old moinmoin wiki were not in a format the upgrade script wanted to handle. So, I modified the db to work around this and thought it would just reattribute those pages and all would be well. However, what I did seems to have changed it so the upgrade was fine, but now we have pages it won't show.

I am 99% sure the data is still there, it's just not displaying for some reason.

I have just not had the time to spend to look at the mediawiki schema and php code and mysql db and figure out why it's unhappy with those pages.
If someone else would like to do this, we can grant access in staging to try and figure it out.

OK, sorry it took so long.
I think the following query, executed against the mysql database backing the wiki should resolve the situation. You were right, the content (en_text, en_revision, en_slot ...) rows are all there, the problem is with the actors. The problem appeared to be the absence of en_revision_actor_temp rows that associated revisions with actors and pages, i.e., the change history of a page.

insert into en_revision_actor_temp select rev_id, 23373, rev_timestamp, rev_page from en_revision where not exists (select * from en_revision_actor_temp where rev_timestamp = revactor_timestamp and rev_page = revactor_page);

The 23373 literal is the actor_id of the en>ImportUser thing. It's just a placeholder. We may be able to pull in a more legitimate actor-id to associate with each revision, but who cares. :-)

This query fixed 13741ish pages on the staging wiki and ran in about 3 minutes. A way to generate the list of affected revisions is this query:

select * from en_revision where not exists (select * from en_revision_actor_temp where rev_timestamp = revactor_timestamp and rev_page = revactor_page);

and from the rev_page column one can hop to the en_page table to find out the affected page names. Sorry, I think I've already fixed the stg copy so I can't generate that list for reference now.

ok. Run on prod and looks good. ;)

Many many thanks for tracking this down...

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog