An interesting alternative to rsync is zsync http://zsync.moria.org.uk/ . A very brief summary of differences:
* Instead of performing the sender portion of work of generating checksums on-demand, it is performed once when the file is "published" and saved in a zsync metadata file
* This zsync metadata file is fetched (simple copy) and the receiver uses it to decide which portions of the file it needs to request. It then requests only those portions.
* Because of the simplification, the protocol can be reduced to work over simple stateless http. Any HTTPD that supports range requests can be a zsync server. Remote zsync files are represented by http urls.
* Note, this all but removes the CPU requirement of the sender/server.
I've used zsync in some very large systems to efficiently distribute write-few read-often files with only partial changes to many endpoints. Much more scalable than rsync due to the lack of CPU cost for the server/sender.
I also maintain a fork of zsync which runs using libcurl rather than the original author's custom http client code. This fork is primarily to support SSL: https://github.com/eam/zsync
All is true, but do note that zsync is, at least for now, a single file system. If you are rsyncing thousands of files over a slow connection (because only little has changed), rsync can often do this with just a handful of bytes more than the actual changes, and zsync needs hundreds of bytes per file just to see nothing has changed.
Use zsync to distribute a small number of large files that have small changes. If you need to rsync hierarchies with lots of files, rsync is still king.
Absolutely true, the zsync client operates on a single file and doesn't manipulate file metadata. But this is a solvable problem and I have written wrappers which will deal with file hierarchies approximately as efficiently as rsync. Here is one I developed to drive a CM system comprised of many small files most of which are unchanging: https://github.com/yahoo/cm3/tree/master/azsync
The additional process is to generate and send a list of filenames and metadata attributes (which rsync must do as well) and to invoke zsync per-file only if an update is necessary. For large trees of files which are largely unchanged this is very efficient - much moreso than fetching a zsync manifest per-file.
The file path is generally the largest amount of data sent per-file, prior to sending the zsync manifest. This is similar to rsync.
* Instead of performing the sender portion of work of generating checksums on-demand, it is performed once when the file is "published" and saved in a zsync metadata file
* This zsync metadata file is fetched (simple copy) and the receiver uses it to decide which portions of the file it needs to request. It then requests only those portions.
* Because of the simplification, the protocol can be reduced to work over simple stateless http. Any HTTPD that supports range requests can be a zsync server. Remote zsync files are represented by http urls.
* Note, this all but removes the CPU requirement of the sender/server.
I've used zsync in some very large systems to efficiently distribute write-few read-often files with only partial changes to many endpoints. Much more scalable than rsync due to the lack of CPU cost for the server/sender.
I also maintain a fork of zsync which runs using libcurl rather than the original author's custom http client code. This fork is primarily to support SSL: https://github.com/eam/zsync
It's a cool project, check it out!