How to setup rsync for anonymous mirroring
Copyright: Karsten Thygesen (karthy@sunsite.auc.dk), SunSITE Denmark, Aalborg University, Denmark : http://sunsite.dk/info/guides/rsync/rsync-mirroring.html
Compilation
Get and compile the latest rsync from either the authors site ftp://samba.anu.edu.au/pub/rsync or one of the mirror-sites
ftp://sunsite.auc.dk/pub/unix/rsyncftp://ftp.sunet.se/pub/unix/admin/rsyncftp://ftp.fu-berlin.de/pub/unix/network/rsyncrsync://samba.anu.edu.au/rsyncftp/Configuration
You should read the rsyncd.conf manual page for more information. When using rsync in daemon mode, it uses a single configuration file, which is by default /etc/rsyncd.conf, but this can be changed by the command line by option --config FILE. The config is in the same format as the samba config file. An example follows:
motd file = /etc/motd
max connections = 25
syslog facility = local3
[ftp]
comment = ftp area
path = /pack/ftp
read only = yes
list = yes
uid = nobody
gid = nobody
[tmp]
comment = temporary file area
path = /tmp
read only = no
list = yes
hosts allow = 192.168.2.0/24 127.0.0.0/8 *.anu.edu.au
auth users = tridge, susan
secrets file = /etc/rsyncd.secrets
The example shows, that there is a global section at the top followed by definitions for each exported "module". Any non-global option set in the global section changes the default for the option. Each option is explained below:
Global options:
- motd file:
- Names a text file, message of today, which will be displayed to clients just before the file transfer begins. It should be located in the global section.
- max connections:
- Maximum allowed concurrent clients.
- lock file:
- Location of the lock file [purpose?]
- syslog facility:
- The numeric value of the syslog facility to log connections and statistics to. These values is normally defined in the syslog manual page.
Local/Module options:
- comment:
- Names the module. If
listis set to yes, this name will be showed when the client requests a list of exported modules. - path:
- Defines the root of the file tree, which should be exported. Rsync will chroot to this directory.
- read only:
- Defines if the modules should be read only or not. Until authentication is implemented, it is recommended to set this to
yes - list:
- Defines if the module should show up, when a client requests a list of modules.
- uid:
- The user id rsync will change to after having performed a chroot. For anonymous access, this is often the user id of nobody.
- gid:
- The group id rsync will change to after having performed a chroot. For anonymous access, this is often the group id of nobody.
- hosts allow:
- List of hosts allowed to connect to this module as either hostnames (with * wildcard) or IP/mask
- hosts deny:
- List of hosts denied to connect to this module
- auth users:
- List of users who can authenticate
- secrets file:
- The name of a file containing username:password (the password is in cleartext).
Remark, that the rsync daemon performs a chroot() system call to the path defined in the config file, so all files served must be contained within the path (no symlinks to outside the path). For anonymous rsync access, a typical use is to let the path be the anonymous ftp root directory.
Launching
The rsync daemon uses a privileged TCP port (for security reasons). For your own convenience, it might be an advantage to add this port to your /etc/services or equivalent file. Add a line like:
rsync 873/tcp
The daemon can be launched either via inetd or stand-alone. It checks whether stdin is a socket and if it is then it assumes it was launched via inetd. The rsync daemon is robust, so it is safe to launch it as a stand-alone server. The code that loops waiting for requests is only a few lines long then it forks a new copy. If the forked process dies then it doesn't harm the main daemon. The big advantage of running as a daemon will come when the planned directory cache system is implemented. The caching system will probably only be enable when running as a daemon. For this reason, busy sites is recommended to run rsync as a daemon. Also, the daemon mode makes it easy to limit the number of concurrent connections.
You will have to make a startup script, which will run during system boot or you might even consider to launch it from init, if you have that possibility. The details on how this is done is left as an exercise as it is very OS dependent.
Launching as stand-alone daemon
Make sure rsync gets started in the boot process by editing /etc/rc.local or making a startup script on SYSV platforms. The script should contain a line like:
rsync --daemon
The rsync daemon will fork and continue to run in the background. Each new connection will make rsync fork to handle that request.
Launching from inetd
If you choose to launch the rsync daemon from inetd then insert a line similar to this:
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon
Please notice, that the syntax varies between different vendors. The above line is valid for Solaris. Some inetd does not require a named port. If this is the case, then you can skip the step of editing /etc/services and just list the port number (default 873) directly in inetd.conf. After editing the /etc/inetd.conf remember to send a HUP signal to
the inetd process. Watch for errors in the /var/adm/messages or similar file.
Testing
Now that the installation is complete, you should test it. At the client end the syntax is nearly the same as normal rsync except that you use :: instead of :. Start by requesting a list of exported modules like this:
bash$ ./rsync localhost:: Welcome to Fjall. ftp ftp area tmp temporary file area
The motd is shown followed by a list of available modules and their comments. Now try to sync a directory within one of the modules. If you get an error like failed to connect to localhost - Connection refused, then something properly went wrong in the inetd configuration. Watch for error in /var/adm/messages or similar file. Did you remember to reinitialize inetd?
Logging
rsync logs to syslog in the /var/adm/messages file, but it can be configured to use any syslog facility. A typical use is to let it log to an unused facility, like LOCAL3 and then append a line like
local3.info /var/adm/rsync.log
to syslog.conf
Examples on mirroring
When mirroring from a rsync enabled site, the mirroring is as simple as creating a cron job which daily (depending on the subject which are mirrored) launches rsync like this:
rsync -a samba.anu.edu.au::sambaftp/ /disk1/mirrors/samba/
The mirroring will be far more efficient in both time and in bytes transfered. Watch the statistics at the end of a mirroring.
Advanced installation
If you are running a busy site, it is worth to consider the impact of rsync daemon. The daemon make a high impact on the server in terms of both CPU and disk I/O. The disk I/O is similar to running a ftp server, but the CPU usage is higher. The CPU usage is due to the nature of rsync, where it will have to open every file and calculate checksums. In most cases, however, it only has to open and calculate checksums if it decides that a sync is necessary. So it will in fact use very little CPU to do a mirror run if no files (or not many files) have changed. You might take two approaches to control the impact: both limiting the concurrent number of clients and launch the daemon with a lower priority. To launch the daemon with a lover priority, a straight forward method is to create a wrapper around the real binary, and launch this wrapper instead. Such a wrapper could be as simple at this:
#!/bin/sh exec nice -19 /path/to/real/rsync $*
and then save the file and use this as your rsync daemon. This wrapper will work when launched from inetd and stand-alone. Any busy site should launch the daemon in stand-alone mode to be able to control the maximum number of concurrent clients.
Copyright: Karsten Thygesen (karthy@sunsite.auc.dk), SunSITE Denmark, Aalborg University, Denmark
