Hello,
we pay for Weblate's hosting. Automate backup is included in this service, and we can download it using SSH. In case of disagreement with Weblate company or major failure, do we want to have a backup own by Fedora infra team?
Here is the documentation: https://docs.weblate.org/en/weblate-3.11.2/admin/backup.html Do we want to automate this?
There is no rush here, we can do this before summer.
Needs figuring out by someone who knows the system, but something we should do
Metadata Update from @smooge: - Issue priority set to: Waiting on Assignee (was: Needs Review)
I have full access to backups, and know Weblate hosting=2E There is a passphrase to decrypt the backup and a private ssh key availabl= e=2E What do you need?
So, just skimming the docs... we should probibly store a copy of the passphrase and private/public ssh key for the backups in our private repo.
If we want to store a copy of the data it sounds like we would have to setup a borg repository and ask it to sync the backup there?
Is there some way to just download the existing weblate backups from the cloud? Or manually download one and save it somewhere we control?
Thanks for any info.
FYI: Kevin was made super-admin of Weblate to see the available info
Metadata Update from @cverna: - Issue tagged with: high-gain, medium-trouble
Considering Kevin's busy schedule for the coming weeks/months, I can try to help.
@jibecfed should we try to catch up and discuss what we can do and how?
thank you for followin this, if you want, you can find me on fedora-i18n on IRC please login on https://translate.fedoraproject.org so I can make you admin, to allow you to see borg information
there isn't much to do: keep one backup per year for long term analysis (we only need the database for this) keep last two months backup for operational safety (we need all files, including huge git repositories)
@jibecfed I've logged in, I'll try to catch you sometime this week to discuss what we can do :)
Metadata Update from @pingou: - Issue assigned to pingou
Metadata Update from @cverna: - Assignee reset
I gave Pingou admin rights to access security info (passphrase, private key)
------------------------------------------------------------------------------ Archive name: 2020-05-28T04:00:00 Archive fingerprint: b25253817f5aef083af74d49a4363f02f8f236836f0aaf751c2972c1ba976249 Time (start): Thu, 2020-05-28 04:00:10 Time (end): Thu, 2020-05-28 04:01:39 Duration: 1 minutes 29.48 seconds Number of files: 220586 Utilization of max. archive size: 0% ------------------------------------------------------------------------------ Original size Compressed size Deduplicated size This archive: 4.59 GB 2.17 GB 215.13 MB All archives: 66.36 GB 32.19 GB 5.96 GB Unique chunks Total chunks Chunk index: 638878 3131182 ------------------------------------------------------------------------------
Here is how I downloaded the latest archive: BORG_RSH="ssh -i ~/.ssh/id_rsa_translate_fpo" borg extract ssh://u164666-sub8@u164666-sub8.your-storagebox.de:23/./backups::2020-05-28T04:00:00
BORG_RSH="ssh -i ~/.ssh/id_rsa_translate_fpo" borg extract ssh://u164666-sub8@u164666-sub8.your-storagebox.de:23/./backups::2020-05-28T04:00:00
It creates an home folder: with the following size:
[jb@localhost hop]$ du --max-depth=1 -h 5,1G ./home 5,1G .
Here is the detail:
[jb@localhost hop]$ du --max-depth=1 -h home/weblate/data/ 4,0K home/weblate/data/cache 2,7G home/weblate/data/vcs 849M home/weblate/data/whoosh 332M home/weblate/data/memory 120K home/weblate/data/celery 1,3G home/weblate/data/backups 12K home/weblate/data/fonts 28K home/weblate/data/ssh 37M home/weblate/data/media 32K home/weblate/data/home 5,1G home/weblate/data/
I don't think we can try to cherrypick what we want to backup, as biggest folders are vcs and backups, which contains git repositories and the database, which is the core of the application and represents 4 GB already.
vcs
backups
You may expect all this to grow a lot over time (mostly because of the localization of fedora docs).
@pingou did you do anything here?
Do we want to try and automate this? or just manually grab it once per month? It can be placed on backup01 in /fedora_backups/
@kevin I have the access and some level of understanding on how it works, but I need to sync with you on where to run and place the files.
Metadata Update from @smooge: - Issue tagged with: dev, ops
@smooge you've added the dev tag to this one, what do you think we have to develop for this ticket?
Well, we need a script/cronjob/something if we want to do this in an automated way.
@pingou Do you want to continue on this ticket?
I was looking at the borg docs to understand how things work and clear up some concerns we have during the last meeting, and found a few things :
The borg repository can be fetched and stored on our side, without the need of the borg utility or passphrase. This is not recommended but can be done as long as you don't intend to do anything else than extract backups from this copy. The main advantage is you get the entire backup history, for pretty much the same size as a single extracted backup, thanks to compression/deduplication, without the need to expose the passphrase and install additional packages somewhere only for this task. I'm not saying it's a better solution, but it's an option.
borgbackup is available on epel, along others, and the extract command can use some environment variables like BORG_PASSPHRASE, or even better, BORG_PASSPHRASE_FD, that can be used to read the passphrase from a file (like a kube secret...). That would definitely help for a cronjob.
BORG_PASSPHRASE
BORG_PASSPHRASE_FD
I don't think I'll manage in a timely manner, so if someone is interested to take it over, please go ahead.
Sorry for the inconvenience :(
Answering to myself: The backup extraction seems to be the best choice if we want to keep the bandwidth usage to a minimum. The decryption / decompression / "undeduplication" is made on the client side, so I think we should only pull around 200MB of data each time.
Question is, how often do we want to download the latest backup, and how many archives do we want to keep ?
How about weekly and keep 8 weeks?
It's still running, but it's correctly copying the backups. ;)
See commits 577a3856456ca6daf5aecac985f440d8897f8f04 68105a47bb70dfaccaa465589ad018a2afc90198 86f43030f81eb4da29d9994a4db9ff50e1609701 and e0941ac2ded428b549748fd70e159f9f1a83e8eb
I put the needed info into the ansible-private repo, merged and ran the backup playbook to copy it live. Then, I manually ran the backup script with 'sudo -u _backup_weblate /usr/local/bin/weblate-backup'
Metadata Update from @kevin: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.