#8691 Automation of doc stg internationalization scripts
Closed: Fixed 3 years ago by jibecfed. Opened 4 years ago by jibecfed.

Objective: allow translators to have latest docs in the translation files and generate the translated sources using the latest translation files

Call once a day (eventually, just before the publishing build) the script "build.py" from: https://pagure.io/fedora-docs/translations-scripts

I launch it from my computer using:

time ./build.py --clone_sources true --commit_l10n true --commit_tsources true

==> this pull all documentation, convert it in pot, commit everything, and generate new localized content and commit it

then, call the stats/compute.sh

==> this creates localization progress, as seen https://docs.stg.fedoraproject.org/en-US/localization/

Both scripts requires a SSH key to pull and push Pagure repository.

The technical user that will commit in repositories requires to be in this group for it to work:

https://pagure.io/group/fedora-docs-l10n

To reduce existing build time, you can make persistent the folders, l10n/ sources/ and translated-sources/ that the build.py creates. When all repository already exists on disk, it takes between 15 to 25 minutes to generate.

The automation requires the po4a tool. Which is packaged for Fedora.

Perfect would be to send me an email if the error.txt file contains something. It allows me to detect missing localization repositories.

Thanks a lot for your help!
CC @asamalik @misc


So, my main concern is around the ssh keys. While we can surely avoid it for cloning, we can't for pushing. in turn, if we need to push to the repo, we would need some kind of non human account, and I am not sure how that's being done for Fedora right now.

As for sending a email, if this is done outside of openshift, that should be too hard. Does the script work on Centos 8 or 7, or does it requires specifically Fedora ?

Pagure has deploy keys available in the project's settings. They are tight to a person but are meant to give access to a project to a bot/program.

well, I am not utterly fond of the "tied to a person", but I guess that's better than nothing. I will take a look at that.

Does the script work on Centos 8 or 7, or does it requires specifically Fedora ?

I created it with Fedora and I never tested it with Centos.
The only issue that we will have is the version of po4a itself. Which could probably be workaround using local po4a version: https://github.com/mquinson/po4a/#use-without-installation

Well, I am not utterly fond of the "tied to a person", but I guess that's better than nothing. I will take a look at that.

Weblatebot user was created on purpose for automation: https://pagure.io/user/weblatebot
I'm fine with providing credential to infrastructure team if this may help.

Metadata Update from @cverna:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

4 years ago

@misc should we assign this to you ? or does this needs to be pickup by someone ?

Additional question: How time critical is this? I see it's mentioned on the release readyness blocker page... we have to have this before beta release?

@cverna I can take it, but I do not see how to assign it to me or anything

Metadata Update from @cverna:
- Issue assigned to misc

4 years ago

@misc it is now yours, please reach out if you need anything :-)

Additional question: How time critical is this? I see it's mentioned on the release readyness blocker page... we have to have this before beta release?

I proposed it to Ben as a final release blocker, not a beta release blocker.

Ok, I will do a test on RHEL 7, since that's what's on sundries. This seems easier that doing a job in openshift for now.

Additional question: How time critical is this? I see it's mentioned on the release readyness blocker page... we have to have this before beta release?

I proposed it to Ben as a final release blocker, not a beta release blocker.

To be clear, this is not a blocker because it does not fall under the criteria for blocking a release, but it would be very good to have this ready for F32 Final.

To be clear, this is not a blocker because it does not fall under the criteria for blocking a release, but it would be very good to have this ready for F32 Final.

I'm fine with that.

Using Weblate + the great work related to the internationalization of our documentation makes it a reachable objective for F32 release, but not a real blocker.
Putting the subject on the blocker page helps to share the news and coordinate the work. After all, we have a system-wide change regarding localization, it's been a while it didn't happen, let's make good use of it :)

So after a morning of yak shaving (/home full, system not registered on RHN, so lots of fun), po4a on RHEL 7 do not support asciidoc:

*************************
* convert .adoc to .pot
*************************
Type de format inconnu : asciidoc.

So we would need to run that on some fedora host, ergo, going to do that over openshift as a test, or use mock.

ok so with a local po4a checkout (patch incoming) and a few missing packages (with a misleading error message), I do seems to be able to run it on RHEL 7. I am going to write a ansible role for sundries for that.

Now, it fail because this requires python 3.7 or 3.8.

* commit_l10n: docs
Traceback (most recent call last):
  File "./build.py", line 259, in <module>
    main()
  File "./build.py", line 49, in main
    commit_l10n_repos()
  File "./build.py", line 134, in commit_l10n_repos
    [commit_l10n(r) for r in next(os.walk(repo_dir))[1]]
  File "./build.py", line 134, in <listcomp>
    [commit_l10n(r) for r in next(os.walk(repo_dir))[1]]
  File "./build.py", line 147, in commit_l10n
    out = subprocess.run(['git', 'status', '--porcelain'], check=True, cwd=repo_dir, capture_output=True)
  File "/usr/lib64/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
TypeError: __init__() got an unexpected keyword argument 'capture_output'

I guess I have to patch that too.

@misc : to simplify this work, I gave you commit rights on the repository.

So, I have a commit, I didn't pushed/run it: https://pagure.io/fork/misc/fedora-infrastructure/c/4736971f46c5c0097a9a50733feda51ca40d8ada?branch=add_trans

I am waiting on the last commit I sent, if that's the right level of options you wanted or not.

Thanks a lot misc, I merged https://pagure.io/fedora-docs/translations-scripts/pull-request/11

Could you please use this cron shedule? 0 2 * * *

The shortest delay between new English content and it's translations is two runs of this script.
One will update pot and sync weblate. The other run will take translator work and generate localized content.
Running the script once a day means: in the best scenario, it takes two days until we get translations.

I'm unsure what I will get by email with the cron task you created. Is it the full output of the script?

Because of po4a (and because we have thousands of pages), it contains kilometers of log.
It's not a huge issue to me, but we may reach the limits of email system ;)

I can change the cron schedule, yes.

However, unless the script is changed, I can either send nothing, or send all :/

Send all is fine, I'll learn how to catch po4a's output and send it to /dev/null and I'll make clean output errors.
Receiving huge emails will just motivate me to get it done faster ;)

So I pushed the ansible playbook and deployed it. However, I didn't anticipate that I couldn't run ansible with -vvvv due to rbac-playbook, so the last step missing is getting someone with enough access to look at the generated ssh key to add the pub key.

On sundries, there is a file /home/_update_docs_trans/.ssh/id_rsa_docs_trans.pub and we need the content of the file to be placed on pagure so it can be added as a "Deploy Keys"

Actions sur les courriels
  Répondre à l’expéditeur Répondre à la liste ou à l’expéditeur et à tous les destinataires Transférer le courriel   Ouvrir dans une nouvelle fenêtre
Objet: Cron <_update_docs_trans@sundries01> /usr/local/bin/lock-wrapper cron-docs-translation-update "/usr/local/bin/docs-translation-update"
Photo du contact
De  (Cron Daemon)
À   jibecfed@fedoraproject.org
Date    Aujourd’hui 04:00
Corps du courriel
Cloning into 'translations-scripts'...
Traceback (most recent call last):
  File "./build.py", line 11, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'
Runnning po Count
./stats/compute.sh: line 7: stats/results.csv: No such file or directory
Cloning
Cloning into 'localization'...
Host key verification failed.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Installing python dep in pip
./stats/compute.sh: line 14: virtualenv: command not found
./stats/compute.sh: line 15: venv/bin/activate: No such file or directory
./stats/compute.sh: line 16: pip: command not found
Generating stats
Traceback (most recent call last):
  File "./build.py", line 11, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'
Commit changes
./stats/compute.sh: line 23: pushd: localization: No such file or directory
rm -Rf /tmp/fedora_docs_trans_s3QZ

So, two minor issues blocked the correct execution:

  1. python yaml is required and should be added
  2. you need to be in stats folder to call compute.sh

It makes me think that for the second script, you need to have pocount from translate-toolik

So I pushed the ansible playbook and deployed it. However, I didn't anticipate that I couldn't run ansible with -vvvv due to rbac-playbook, so the last step missing is getting someone with enough access to look at the generated ssh key to add the pub key.
On sundries, there is a file /home/_update_docs_trans/.ssh/id_rsa_docs_trans.pub and we need the content of the file to be placed on pagure so it can be added as a "Deploy Keys"

@misc I sent you the public key on your @fp.o email.

Ok, so I did ad the missing deps, and I am going to do a test run. Then, we need to accept the ssh keys (going to do that in the ssh config), and wait for the next run.

so, one issue is that we need to add the ssh pub key to every repo. The 40 of them.
@jibecfed I can help, but I think that giving me access to 40 repo is kinda as annoying as adding the keys yourself. And since I do not have access, I can't automate much.

thanks to misc' work and help, we are close to be done on this request.

The key feature works as expected (ascidoc source convertion to pot, and localized content generated from po files). It runs daily, and I'm tuning the script to handle exceptions, reduce log output, etc.

The side feature is broken (generate translation progress statistics), we need to make sure translate-toolkit is installed by ansible. I think it's a freeze break request be done if we want to solve that.

translate-toolkit-1.11.0-2.el7.noarch does seem to be installed?

Oh, can you all also get your script/crons to clean up tmp files? I removed a bunch of old ones, but it would be nice to do automatically:

cd /tmp
du -sh fedora_docs_trans*
2.0G fedora_docs_trans_2CGp
2.0G fedora_docs_trans_2T8j
2.1G fedora_docs_trans_fAyc
496K fedora_docs_trans_L6P7
2.1G fedora_docs_trans_OLUE
2.1G fedora_docs_trans_v2Cx

Thats a fair bit of space.

It do remove them, afaik, but I was waiting for the freeze to debug it and to not distract from the release with FBR and stuff.

Freeze is now over. :)

So whats left to do on this request? Just make it live in prod?

@misc are you following up on this ticket? Anything we can help with?

@misc anything left to do here ? or can we close that ticket ?

I confirm it was fixed by Michaël, thank you all

Metadata Update from @jibecfed:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata