We are doing a daily copy of data in file system in Linux manually and I want to automate the same .
We need many features in this automation:
1. We want to be able to exit in case of errors
2. We want the copy to happen via ssh
3. We prefer that whole data is not copied everytime and only differences are copied
4. When the source and destination are not in sync data should be cleaned up in destination before source copy to use less bandwidth
5. We do not want to have the luxury to do any fancy tools for this but want to use OS best practices
The solution is to use Rsync
. It can be effectively used for file system backups and for synchronization with different servers. It can also be set up with Cron jobs or an batch Execution engines.
This tool has several command line options which are very important:
-a for archiving
-e for support for ssh
--partial - for retaining partial files
--delete / --delete after - for avoiding file accumulation in Destination Server
--timeout - For setting timeout
--progress - It is used for indicating progress and speed etc
Timeout : File system size and speed must be estimated for tuning rsync timeout value.
Delete / Delete after : The delete /delete after is essential feature which is best utilized to keep the source and destination in synch. These avoid file accumulation in the destination server. However manual changes to destination would be lost , so any files added manually under the same file system would be deleted on using these options.
-e - Rsync will assume ssh is set up and the users used for rsync will have keyless authentication in case it is run through a cron job.
As with any copy resources and network bandwidth will be the main resource bottlenecks.
Rsync scripts must be tested thoroughly.
Some best practices:
Grep running processes whether the rsync for same filesystem is already running in source and only run if it is not already being run.
Check the disk utilization on source or destination to decide which option to use , delete or delete after.
Owner and group should be consistent for rsync user to synchronize permissions same as the source system