#12078 New site with rsync only targets cannnot be added to the mirror network
Closed: Fixed 8 months ago by cern. Opened 8 months ago by cern.

Hello,

We added a new host (3117) to our site (2277) for the purposes of separating rsync traffic.
As such, the urls for the new site contain only rsync targets.

So far we’ve not been able to have this host added to the mirror network. The log simply states "INFO - All categories failed".

The new site has the same host configuration as the previous site (2648), and the same rsync targets. We don't see any hits from 38.145.60.3 in our logs at all, even though rsync (tcp/873) is open to the internet on these hosts. Any idea of what's wrong?


CC: @adrian @abompard

I am not fully sure this is a supported setup... we use rsync to crawl hosts, but I am not sure why it wouldn't work to have a host with only rsync...

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-gain, low-trouble, ops

8 months ago

Yeah, that's not going to work. In MirrorManager, rsync is used to get the list of files and their sizes, but when we need to get the checksums of the actual files content, we use HTTP, so that needs to be available.
Can DNF even use rsync to get packages?

So how should we go about separating rsync from http traffic on different hosts? Do we need to host a webserver on the rsync nodes that redirects to the real http servers just to trick the crawler?

I do not really understand what the goal is. From my point of view this doesn't make much sense.

DNF does not use rsync so it doesn't make much sense to have a separate entry in MirrorManager. This would be only useful for the public mirror listing which I doubt many people look at.

From MirrorManager's view having a separate host for rsync makes our life harder, because we cannot scan the HTTP mirror efficiently via rsync and scanning the rsync mirror makes no sense because there is no value having the state of a rsync only mirror in the database (because DNF cannot use it, so it is not relevant for metalinks).

I assume you want to advertise that rsync is running on a separate host and that people using rsync should use that host.

For us it would be helpful to have a rsync endpoint for efficient scanning. So there are two different goals. You want direct rsync users to use another host, which is not relevant for DNF. We want a rsync endpoint for the host we redirect DNF users to because of it being much more efficient for MirrorManager scanning.

It feels you are trying to use MirrorManager for something we did not think about because it is not an important use case for us.

Please do not use any redirects because that will lead to broken mirror states. We have seen it many times before that redirecting brings problems because at some point one mirror will be out of date and by redirecting we will have the wrong state in the database and then we will redirect people to broken mirrors.

From my point of view an rsync only host does not make much sense mainly because the anything that is not used by DNF (mirrorlist/metalink) is not used that much.

If you want to highlight that you have a separate host for rsync just add a comment to your mirror and it will appear at: https://mirrormanager.fedoraproject.org/mirrors/CentOS/9-stream/x86_64

Please also add rsync to your primary mirror because mirror crawling/scanning works best if you offer rsync.

Ok, we understood now how to make this work. Sorry for the noise!

Metadata Update from @cern:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

8 months ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog