How to setup rsync for anonymous mirroring

17 Nov 2003
Posted by loner

Copyright: Karsten Thygesen (karthy@sunsite.auc.dk), SunSITE Denmark, Aalborg University, Denmark : http://sunsite.dk/info/guides/rsync/rsync-mirroring.html

Compilation

Get and compile the latest rsync from either the authors site ftp://samba.anu.edu.au/pub/rsync or one of the mirror-sites

  • ftp://sunsite.auc.dk/pub/unix/rsync
  • ftp://ftp.sunet.se/pub/unix/admin/rsync
  • ftp://ftp.fu-berlin.de/pub/unix/network/rsync
  • Or naturally rsync://samba.anu.edu.au/rsyncftp/
  • Configuration

    You should read the rsyncd.conf manual page for more information. When using rsync in daemon mode, it uses a single configuration file, which is by default /etc/rsyncd.conf, but this can be changed by the command line by option --config FILE. The config is in the same format as the samba config file. An example follows:

    motd file = /etc/motd
    max connections = 25
    syslog facility = local3
    
    [ftp]
            comment = ftp area
            path = /pack/ftp
            read only = yes
            list = yes
            uid = nobody
            gid = nobody
    [tmp]
            comment = temporary file area
            path = /tmp
            read only = no
            list = yes
            hosts allow = 192.168.2.0/24 127.0.0.0/8 *.anu.edu.au
            auth users = tridge, susan
            secrets file = /etc/rsyncd.secrets           
    
    

    The example shows, that there is a global section at the top followed by definitions for each exported "module". Any non-global option set in the global section changes the default for the option. Each option is explained below:

    Global options:

    motd file:
    Names a text file, message of today, which will be displayed to clients just before the file transfer begins. It should be located in the global section.
    max connections:
    Maximum allowed concurrent clients.
    lock file:
    Location of the lock file [purpose?]
    syslog facility:
    The numeric value of the syslog facility to log connections and statistics to. These values is normally defined in the syslog manual page.

    Local/Module options:

    comment:
    Names the module. If list is set to yes, this name will be showed when the client requests a list of exported modules.
    path:
    Defines the root of the file tree, which should be exported. Rsync will chroot to this directory.
    read only:
    Defines if the modules should be read only or not. Until authentication is implemented, it is recommended to set this to yes
    list:
    Defines if the module should show up, when a client requests a list of modules.
    uid:
    The user id rsync will change to after having performed a chroot. For anonymous access, this is often the user id of nobody.
    gid:
    The group id rsync will change to after having performed a chroot. For anonymous access, this is often the group id of nobody.
    hosts allow:
    List of hosts allowed to connect to this module as either hostnames (with * wildcard) or IP/mask
    hosts deny:
    List of hosts denied to connect to this module
    auth users:
    List of users who can authenticate
    secrets file:
    The name of a file containing username:password (the password is in cleartext).

    Remark, that the rsync daemon performs a chroot() system call to the path defined in the config file, so all files served must be contained within the path (no symlinks to outside the path). For anonymous rsync access, a typical use is to let the path be the anonymous ftp root directory.

    Launching

    The rsync daemon uses a privileged TCP port (for security reasons). For your own convenience, it might be an advantage to add this port to your /etc/services or equivalent file. Add a line like:

    rsync 873/tcp

    The daemon can be launched either via inetd or stand-alone. It checks whether stdin is a socket and if it is then it assumes it was launched via inetd. The rsync daemon is robust, so it is safe to launch it as a stand-alone server. The code that loops waiting for requests is only a few lines long then it forks a new copy. If the forked process dies then it doesn't harm the main daemon. The big advantage of running as a daemon will come when the planned directory cache system is implemented. The caching system will probably only be enable when running as a daemon. For this reason, busy sites is recommended to run rsync as a daemon. Also, the daemon mode makes it easy to limit the number of concurrent connections.

    You will have to make a startup script, which will run during system boot or you might even consider to launch it from init, if you have that possibility. The details on how this is done is left as an exercise as it is very OS dependent.

    Launching as stand-alone daemon

    Make sure rsync gets started in the boot process by editing /etc/rc.local or making a startup script on SYSV platforms. The script should contain a line like:

    rsync --daemon

    The rsync daemon will fork and continue to run in the background. Each new connection will make rsync fork to handle that request.

    Launching from inetd

    If you choose to launch the rsync daemon from inetd then insert a line similar to this:

    rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon

    Please notice, that the syntax varies between different vendors. The above line is valid for Solaris. Some inetd does not require a named port. If this is the case, then you can skip the step of editing /etc/services and just list the port number (default 873) directly in inetd.conf. After editing the /etc/inetd.conf remember to send a HUP signal to
    the inetd process. Watch for errors in the /var/adm/messages or similar file.

    Testing

    Now that the installation is complete, you should test it. At the client end the syntax is nearly the same as normal rsync except that you use :: instead of :. Start by requesting a list of exported modules like this:

    
    bash$ ./rsync localhost::
    Welcome to Fjall.
    
    ftp             ftp area
    tmp             temporary file area
    
    

    The motd is shown followed by a list of available modules and their comments. Now try to sync a directory within one of the modules. If you get an error like failed to connect to localhost - Connection refused, then something properly went wrong in the inetd configuration. Watch for error in /var/adm/messages or similar file. Did you remember to reinitialize inetd?

    Logging

    rsync logs to syslog in the /var/adm/messages file, but it can be configured to use any syslog facility. A typical use is to let it log to an unused facility, like LOCAL3 and then append a line like

    local3.info          /var/adm/rsync.log
    

    to syslog.conf

    Examples on mirroring

    When mirroring from a rsync enabled site, the mirroring is as simple as creating a cron job which daily (depending on the subject which are mirrored) launches rsync like this:

    rsync -a samba.anu.edu.au::sambaftp/ /disk1/mirrors/samba/

    The mirroring will be far more efficient in both time and in bytes transfered. Watch the statistics at the end of a mirroring.

    Advanced installation

    If you are running a busy site, it is worth to consider the impact of rsync daemon. The daemon make a high impact on the server in terms of both CPU and disk I/O. The disk I/O is similar to running a ftp server, but the CPU usage is higher. The CPU usage is due to the nature of rsync, where it will have to open every file and calculate checksums. In most cases, however, it only has to open and calculate checksums if it decides that a sync is necessary. So it will in fact use very little CPU to do a mirror run if no files (or not many files) have changed. You might take two approaches to control the impact: both limiting the concurrent number of clients and launch the daemon with a lower priority. To launch the daemon with a lover priority, a straight forward method is to create a wrapper around the real binary, and launch this wrapper instead. Such a wrapper could be as simple at this:

    
    #!/bin/sh
    exec nice -19 /path/to/real/rsync $*
    
    

    and then save the file and use this as your rsync daemon. This wrapper will work when launched from inetd and stand-alone. Any busy site should launch the daemon in stand-alone mode to be able to control the maximum number of concurrent clients.

    Copyright: Karsten Thygesen (karthy@sunsite.auc.dk), SunSITE Denmark, Aalborg University, Denmark

    What did I say then?

    Debt to America! (4 years 32 weeks ago): Treasury Secretary Henry Paulson's brilliant $700 billion plan could buy every single American 2,000 McDonald's apple pies.