#8729 Do we want to have a copy of translate.fp.o backups?
Closed: Fixed with Explanation 2 years ago by kevin. Opened 4 years ago by jibecfed.

Hello,

we pay for Weblate's hosting. Automate backup is included in this service, and we can download it using SSH.
In case of disagreement with Weblate company or major failure, do we want to have a backup own by Fedora infra team?

Here is the documentation: https://docs.weblate.org/en/weblate-3.11.2/admin/backup.html
Do we want to automate this?

There is no rush here, we can do this before summer.


Needs figuring out by someone who knows the system, but something we should do

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

4 years ago

I have full access to backups, and know Weblate hosting=2E
There is a passphrase to decrypt the backup and a private ssh key availabl=
e=2E
What do you need?

So, just skimming the docs... we should probibly store a copy of the passphrase and private/public ssh key for the backups in our private repo.

If we want to store a copy of the data it sounds like we would have to setup a borg repository and ask it to sync the backup there?

Is there some way to just download the existing weblate backups from the cloud? Or manually download one and save it somewhere we control?

Thanks for any info.

FYI: Kevin was made super-admin of Weblate to see the available info

Metadata Update from @cverna:
- Issue tagged with: high-gain, medium-trouble

4 years ago

Considering Kevin's busy schedule for the coming weeks/months, I can try to help.

@jibecfed should we try to catch up and discuss what we can do and how?

thank you for followin this, if you want, you can find me on fedora-i18n on IRC
please login on https://translate.fedoraproject.org so I can make you admin, to allow you to see borg information

there isn't much to do:
keep one backup per year for long term analysis (we only need the database for this)
keep last two months backup for operational safety (we need all files, including huge git repositories)

@jibecfed I've logged in, I'll try to catch you sometime this week to discuss what we can do :)

Metadata Update from @pingou:
- Issue assigned to pingou

3 years ago

Metadata Update from @cverna:
- Assignee reset

3 years ago

I gave Pingou admin rights to access security info (passphrase, private key)

------------------------------------------------------------------------------
Archive name: 2020-05-28T04:00:00
Archive fingerprint: b25253817f5aef083af74d49a4363f02f8f236836f0aaf751c2972c1ba976249
Time (start): Thu, 2020-05-28 04:00:10
Time (end):   Thu, 2020-05-28 04:01:39
Duration: 1 minutes 29.48 seconds
Number of files: 220586
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                4.59 GB              2.17 GB            215.13 MB
All archives:               66.36 GB             32.19 GB              5.96 GB

                       Unique chunks         Total chunks
Chunk index:                  638878              3131182
------------------------------------------------------------------------------

Here is how I downloaded the latest archive: BORG_RSH="ssh -i ~/.ssh/id_rsa_translate_fpo" borg extract ssh://u164666-sub8@u164666-sub8.your-storagebox.de:23/./backups::2020-05-28T04:00:00

It creates an home folder: with the following size:

[jb@localhost hop]$ du --max-depth=1 -h
5,1G    ./home
5,1G    .

Here is the detail:

[jb@localhost hop]$ du --max-depth=1 -h home/weblate/data/
4,0K    home/weblate/data/cache
2,7G    home/weblate/data/vcs
849M    home/weblate/data/whoosh
332M    home/weblate/data/memory
120K    home/weblate/data/celery
1,3G    home/weblate/data/backups
12K home/weblate/data/fonts
28K home/weblate/data/ssh
37M home/weblate/data/media
32K home/weblate/data/home
5,1G    home/weblate/data/

I don't think we can try to cherrypick what we want to backup, as biggest folders are vcs and backups, which contains git repositories and the database, which is the core of the application and represents 4 GB already.

You may expect all this to grow a lot over time (mostly because of the localization of fedora docs).

@pingou did you do anything here?

Do we want to try and automate this? or just manually grab it once per month? It can be placed on backup01 in /fedora_backups/

@kevin I have the access and some level of understanding on how it works, but I need to sync with you on where to run and place the files.

Metadata Update from @smooge:
- Issue tagged with: dev, ops

3 years ago

@smooge you've added the dev tag to this one, what do you think we have to develop for this ticket?

Well, we need a script/cronjob/something if we want to do this in an automated way.

@pingou Do you want to continue on this ticket?

I was looking at the borg docs to understand how things work and clear up some concerns we have during the last meeting, and found a few things :

The borg repository can be fetched and stored on our side, without the need of the borg utility or passphrase. This is not recommended but can be done as long as you don't intend to do anything else than extract backups from this copy.
The main advantage is you get the entire backup history, for pretty much the same size as a single extracted backup, thanks to compression/deduplication, without the need to expose the passphrase and install additional packages somewhere only for this task.
I'm not saying it's a better solution, but it's an option.

borgbackup is available on epel, along others, and the extract command can use some environment variables like BORG_PASSPHRASE, or even better, BORG_PASSPHRASE_FD, that can be used to read the passphrase from a file (like a kube secret...).
That would definitely help for a cronjob.

@pingou Do you want to continue on this ticket?

I don't think I'll manage in a timely manner, so if someone is interested to
take it over, please go ahead.

Sorry for the inconvenience :(

Answering to myself:
The backup extraction seems to be the best choice if we want to keep the bandwidth usage to a minimum. The decryption / decompression / "undeduplication" is made on the client side, so I think we should only pull around 200MB of data each time.

Question is, how often do we want to download the latest backup, and how many archives do we want to keep ?

How about weekly and keep 8 weeks?

It's still running, but it's correctly copying the backups. ;)

See commits 577a3856456ca6daf5aecac985f440d8897f8f04 68105a47bb70dfaccaa465589ad018a2afc90198 86f43030f81eb4da29d9994a4db9ff50e1609701 and e0941ac2ded428b549748fd70e159f9f1a83e8eb

I put the needed info into the ansible-private repo, merged and ran the backup playbook to copy it live. Then, I manually ran the backup script with 'sudo -u _backup_weblate /usr/local/bin/weblate-backup'

Metadata Update from @kevin:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Boards 2
ops Status: Backlog
dev Status: Triaged
Related Pull Requests
  • #773 Merged 2 years ago