#2331 doc: document raid on copr-be
Merged 2 years ago by praiskup. Opened 2 years ago by praiskup.
Unknown source raid-on-backend  into  main

@@ -21,6 +21,7 @@

  

     aws_tips_and_tricks

     enlarge_disk_copr_be

+    raid_on_backend

  

  

  .. toctree::

@@ -0,0 +1,61 @@

+ .. _raid_on_backend:

+ 

+ How to manage RAID 10 on copr-backend

+ =====================================

+ 

+ There are currently four AWS EBS sc1 volumes (4x12T, 144MB/s per volume) forming

+ a RAID 10 array.  On top of this is a LVM volume group named

+ ``copr-backend-data`` (24T, and we can add more space in the future).

+ 

+ Everything is configured so the machine should start and mount everything

+ correctly.  We just can keep monitoring ``/proc/mdstat`` from time to time.

+ 

+ 

+ Manually checking/stopping checks

+ ---------------------------------

+ 

+ Commands needed::

+ 

+     echo idle > /sys/block/md127/md/sync_action

+     echo check > /sys/block/md127/md/sync_action

+ 

+ 

+ Detaching volume

+ ----------------

+ 

+ It's not safe to just force detach the volume in AWS EC2, it could cause data

+ corruption.  Since there are several layers (volumes -> raid -> LVM -> ext4) we

+ need to go the vice versa while detaching.

+ 

+ 1. unmount: ``umount /var/lib/copr/public_html``

+ 2. disable volume group: ``vgchange -a n copr-backend-data``

+ 3. stop the raid: ``mdadm --stop /dev/md127``

+ 4. now you can detach in ec2

+ 

+ 

+ Attaching volume

+ ----------------

+ 

+ 1. attach the volumes in AWS EC2

+ 2. start raid and volume group: ``mdadm --assemble --scan``

+ 3. mount the ``/dev/disk/by-label/copr-data`` volume

+ 

+ There's a `ansible configuration`_ for this, and `list of volumes`_.

+ 

+ 

+ Other tips

+ ----------

+ 

+ Note the **sysctl** ``dev.raid.speed_limit_max`` (in KB/s), this might affect

+ (limit) the initial sync speed, periodic raid checks and potentially the raid

+ re-build.

+ 

+ While trying to do a fast rsync, we experimented with a very large instance type

+ (c5d.18xlarge, 144GB RAM) and with `vm.vfs_cache_pressure=2`, to keep as many

+ inodes and dentries in kernel caches (see ``slabtop``, we eventually had 60M of

+ inodes cached, 28M inodes and 15T synced in 6.5hours).   We had also decreased

+ the ``dirty_ratio`` and ``dirty_background_ratio`` to have more frequent syncs

+ considering the large RAM.

+ 

+ .. _`ansible configuration`: https://pagure.io/fedora-infra/ansible/blob/main/f/roles/copr/backend/tasks/mount_fs.yml

+ .. _`list of volumes`: https://pagure.io/fedora-infra/ansible/blob/main/f/inventory/group_vars/copr_all_instances_aws

no initial comment

Build succeeded.

rebased onto b51cafb8db5f4b689df4645030fb919f61a57332

2 years ago

Build succeeded.

Metadata Update from @praiskup:
- Pull-request tagged with: wip

2 years ago

rebased onto bc1afe5

2 years ago

Build succeeded.

Metadata Update from @praiskup:
- Pull-request untagged with: wip

2 years ago

rebased onto aabbc96

2 years ago

Build succeeded.

Commit df98c03 fixes this pull-request

Pull-Request has been merged by praiskup

2 years ago