Describe what you would like us to do:

Our current way of rsyncing data regularly from all the proxies to sundries01 every 10 to 15 minutes is overloading sundries01 causing it to break rsync connections or time out.. this causes a large amount (several thousand emails per day) of cron emails of broken rsyncs.

To fix this we need to rescope out how rsync's are staged as it is currently a galloping herd with 20+ proxies all hitting the system at once and sundries each making 28 different rsyncs at the same time looking at the 'clock' in /var/log/messages (we seem to get around 100-200/second each firing up its own rsyncd'

When do you need this to be done by? (YYYY/MM/DD)

smooge commented 3 years ago

There are currently 28 servers and 28 products.. so the xinetd limits need to be raised somewhat from 250 connections in 5 seconds.

diff --git a/roles/rsyncd/files/rsync.sundries b/roles/rsyncd/files/rsync.sundries
index 5b81e790d..a873787a9 100644
--- a/roles/rsyncd/files/rsync.sundries
+++ b/roles/rsyncd/files/rsync.sundries
@@ -10,8 +10,8 @@ service rsync
         server          = /usr/bin/rsync
         server_args     = --daemon
         log_on_failure  += USERID
-        cps             = 250 5
-        instances       = 250
+        cps             = 900 5
+        instances       = 900
         per_source      = 100
 }

Looks like a sledgehammer

kevin commented 3 years ago

hulk smash! +1

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-trouble, medium-gain, ops

3 years ago

pingou commented 3 years ago

hulk smash! +1

+1 as well

smooge commented 3 years ago

This helped but did not seem to fix the problem from my morning email bucket is still full. I am going to look at spacing out the rsyncs a bit since most of them complete in a minute.

smooge commented 3 years ago

Actually this seems to have cut this down to only at 0400 so I am going to consider this clsoed and fixed.

Metadata Update from @smooge:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

Waiting on Assignee

Boards 1

ops Status: Done

fedora-infrastructure

Source Code

#9335 sundries01 is overloaded Closed: Fixed 3 years ago by smooge. Opened 3 years ago by smooge.

Describe what you would like us to do:

When do you need this to be done by? (YYYY/MM/DD)

Metadata

medium-gain low-trouble ops

Boards 1

#9335 sundries01 is overloaded

Closed: Fixed 3 years ago by smooge. Opened 3 years ago by smooge.