temp45234's comments

temp45234 · on Oct 6, 2013

An interesting alternative to rsync is zsync http://zsync.moria.org.uk/ . A very brief summary of differences:

* Instead of performing the sender portion of work of generating checksums on-demand, it is performed once when the file is "published" and saved in a zsync metadata file

* This zsync metadata file is fetched (simple copy) and the receiver uses it to decide which portions of the file it needs to request. It then requests only those portions.

* Because of the simplification, the protocol can be reduced to work over simple stateless http. Any HTTPD that supports range requests can be a zsync server. Remote zsync files are represented by http urls.

* Note, this all but removes the CPU requirement of the sender/server.

I've used zsync in some very large systems to efficiently distribute write-few read-often files with only partial changes to many endpoints. Much more scalable than rsync due to the lack of CPU cost for the server/sender.

I also maintain a fork of zsync which runs using libcurl rather than the original author's custom http client code. This fork is primarily to support SSL: https://github.com/eam/zsync

It's a cool project, check it out!

beagle3 · on Oct 7, 2013

All is true, but do note that zsync is, at least for now, a single file system. If you are rsyncing thousands of files over a slow connection (because only little has changed), rsync can often do this with just a handful of bytes more than the actual changes, and zsync needs hundreds of bytes per file just to see nothing has changed.

Use zsync to distribute a small number of large files that have small changes. If you need to rsync hierarchies with lots of files, rsync is still king.

temp45234 · on Oct 7, 2013

Absolutely true, the zsync client operates on a single file and doesn't manipulate file metadata. But this is a solvable problem and I have written wrappers which will deal with file hierarchies approximately as efficiently as rsync. Here is one I developed to drive a CM system comprised of many small files most of which are unchanging: https://github.com/yahoo/cm3/tree/master/azsync

The additional process is to generate and send a list of filenames and metadata attributes (which rsync must do as well) and to invoke zsync per-file only if an update is necessary. For large trees of files which are largely unchanged this is very efficient - much moreso than fetching a zsync manifest per-file.

The file path is generally the largest amount of data sent per-file, prior to sending the zsync manifest. This is similar to rsync.