Looking at the sparse documentation of openrsync does not create any confidence for me that it can be an acceptable substitute for rsync.
In my opinion, any program that is supposed to copy files, but which is not able to make perfect copies, i.e. copies that do not lose any bit of data or metadata that was present in the original file, is just unusable garbage.
Unfortunately, most copying programs available in UNIX-like operating systems (and also many archiving programs) do not make perfect file copies with their default options and many of them are never able to make perfect copies, regardless what options are used.
I have not looked recently at the scp command of ssh, but at least until a few years ago it was not possible to make perfect file copies with scp, especially when the copies were done between different operating systems and file systems. That is why I never use scp, but only rsync over ssh.
Rsync is the only program that I have seen, which is able (with the right options) to make perfect file copies even between different operating systems and file systems (for instance between FreeBSD with UFS and Linux with XFS), preserving also metadata like extended file attributes, access control lists and high-precision file timestamps (some copying programs and archiving programs truncate high-precision timestamps).
The current documentation of openrsync does not make any guarantee that it can make complete file copies, so by default I assume that it cannot, so for now it is a program that I consider useless.
Beside rsync for copying, one of the few Linux archiving programs that can archive perfect file copies is bsdtar (when using the pax file format; the ancient tar and cpio file formats cannot store all modern file metadata).
(FYI: I always alias rsync to '/usr/bin/rsync --archive --xattrs --acls --hard-links --progress --rsh="ssh -p XXX -l YYYYYYY"')
(With the right CLI options, "cp" from coreutils can make perfect file copies, but only if it has been compiled with appropriate options; some Linux distributions compile coreutils with wrong options, e.g. without extended file attributes support, in which case "cp" makes only partial file copies, without giving any warnings or errors.)
As a contrast to your take - I work for a backup company and I was really surprised to discover most of our customers (big enterprises) really do not care about 99% of metadata restored correctly and are fine with just restoring the data.
(We restore everything super carefully but sometimes I feel like we're the only ones who care)
I'm willing to bet a decent number "don't care" until they do care because their permissions don't work or their time based script screws up or whatever else nobody thinks about when they're in panic mode about "I lost my data".
In case of a complete disaster recovery, the fact that a script or two might fail is super OK. That's why after recovery there's always the cleanup phase where you fix stuff that broke during recovery.
Nah. Not really. A lot of the useful data out there doesn't need ACL, precise (or any dates at all) etc.
Also, a lot of application-specific data formats already don't care about the "extra" attributes available in various filesystems because those aren't universally supported and implement them themselves in the file format they operate on. For example, DICOMs, or password-protected PDFs or Zip archives etc.
Extended attributes (and resource forks) are mostly a liability and anti-pattern because of their non portability. It would be a huge red flag to find something important in there other than cases of backing up entire OS images.
I too have worked at a back company and I can’t recall any of the customers caring or even knowing about the metadata.
We would only care if the software our customers were running did. Big enterprise software suits were defined to run in hostile environments, in such they mostly rely on their data formats and don’t care about attributes from the filesystem other than so they have access.
I'm with you on this, I think that data is 99% of what is important and the rest can be recreated or improvised and if in your system you rely too much on file metadata your need more engineering
That's the kind of nonsense thinking that leads to folks like apple removing critical features that "noone uses".
reminds me of that yogi berra quote "nobody goes there anymore, it's too crowded"
For example, many people don't even understand target disk mode on apple hardware, but it has saved me countless hours over the years and made administering apple systems a breeze. Ask people who've used target display mode if they can imagine going without it.
on another subject - it's worth mentioning that time machine is based on rsync.
> The current documentation of openrsync does not make any guarantee that it can make complete file copies, so by default I assume that it cannot, so for now it is a program that I consider useless.
Is it possible this is just a documentation style-tone mismatch? My default assumption would be that openrsync is simply a less restrictively licensed rsync, and I wouldn’t assume it works any differently. Have you verified your strong hypothesis? Or are you just expressing skepticism. It’s hard to tell exactly.
Edit: I read the openrsync readme. It says it’s compatible with rsync and points the reader to rsync’s docs. Unless extended file attributes, ACLs, and high resolution timestamps are optional at the protocol level, it must support everything modern rsync supports to be considered compatible, right? Or are you suggesting it lies and accepts the full protocol but just e.g. drops ACLs on the floor?
> The openrsync command line tool is compatible with rsync, but as noted in the documentation openrsync accepts only a subset of rsync’s command line arguments.
Yes but that doesn't necessarily mean it is lacking the functionality to fully copy metadata. It could mean that openrsync has removed archaic and vestigial options to simplify implementation.
Hmm those archaic and vestigial options are probably still a pillar in many usecases :)
I'm still using the multi volume support in tar for example. Which was something that stems from the time when tar was used for tape archives (hence the name tar) on actual tapes. Without that I'd be really screwed because I use a box full of harddrives as backup "tapes" (which works surprisingly well I must say, I needed a small restore only a week or two ago and it really saved my bacon). But I bet 99.9% of tar users have no idea it can even do that.
Rsync is another one of those swiss army knives that people use for a lot more stuff than you might expect. Especially the remote capabilities are amazing.
The problem is that when you clone something but don't provide full compatibility, you're putting your users through much headscratching and frustration. It would be better to not name it after the original then so it's clear it's something different.
It says it’s fully compatible which I take at face value. I guess I’m curious if there’s a real problem here and openrsync is missing support for 0.1% of use cases, or if it’s just pessimistic speculation.
Well, yes. I do really hate opinionated software (e.g. Apple, or GNOME). I do tend to find the weird niches that work for me. I'm currently using KDE and I've totally worked it over. Which is great because I can be much more productive if I'm not constantly fighting against the UI. But yeah such tools with millions of niche features are great for me.
The software packages I really value the most are the ones where a situation causes something really weird I need to do, and I read in the documentation to try to find some workaround, and then I discover that it already has exactly that feature that I need hidden in there somewhere. It's like the developer read my mind :) There's been very few packages that I truly cherished (and very few in this day and age, software in the early PC days was often more powerful IMO).
One of them was SP (SK Packet Radio), where this happened several times. That was truly amazing software, there was so much it could do and it all worked on an 8088 together with a TSR-based softmodem (connected to a radio not a phone line). Wow. Even the insanest stuff that popped into my head I could make happen with just some settings.
That's a really decent discussion though, from both sides. The option was seriously considered.
It's a world of difference to Gnome devs that will just shut down everything.
PS I do think dolphin is the weakest link in the KDE experience though. But they have made some really good improvements with KDE 6. Like the typable crumb trail.
It's a bit of a contrast though with macOS where the finder is one of the least opinionated parts of the OS (and thus for me one of the best). I think that's more of a historical thing though, Apple's vision seems to be more centered around moving file management into the domain of different apps like on iOS. Another thing i don't like but I think ios has loosened that somewhat as they had to contend that it was necessary to make the iPad more of a productivity device (it still really isn't one though)
OpenRsync is from the OpenBSD project. This is typically an indicator of good quality and a good focus on security. However, in this case, even the official website indicates:
OpenBSD often takes an approach of removing rarely-used or archaic functionality to achieve simplicity in code or configuration or improved security. They gutted a lot of openssl when they made libressl. Their OpenSMTPD is vastly simpler than something like postfix or sendmail.
openrsync is very likely good code, but that doesn't mean it replicates every feature of the rsync utility.
You are right, but I have written my comment exactly for making those users aware of this problem.
I consider this a very serious problem, because most naive users will assume automatically that when they give a file copy command they obtain a perfect duplicate of the original file.
It is surprising for them to discover that this is frequently not true.
The risk is that Apple code sign's all the executables they ship and that someone could try to use GPLv3 to force Apple to either give them their signing keys to run their own version (the anti-tivo clauses) or that it would restrict Apple from suing someone for patent infringement because they've shipped GPLv3 software.
Valid or not in anyone else's opinion, it doesn't really matter, the risk that someone will attempt to use a court to enforce one of these tends to mean companies don't want to even go near it.
Working in a Bank we won't touch anything GPL3, even to build our software/services or mobile app, because we don't want to even open that Pandora's box.
We don't have to find out if a court would try to force us to release our signing keys if we don't use or ship any code that contains language that could in some ways be phrased to do that.
For the same reason we spent £1.8m "licensing" iText PDF for Java..... And removing it with extreme prejudice immediately afterwards.
We had very keen developer upgrade all the libraries in our codebase as a "reducing technical debt" task that they decided to undertake themselves.
They couldn't get something working and posted a stack-trace to ask for help..... Some enterprising sales person in iText saw it and emailed them offering to help and asked a question about what they were running and the developer effectively told them they were running version 5 which they didn't even check (or possibly understand) is relicensed under AGPL or commercial license.
The legal threats from iText and the resulting fallout means we now do not allow developers access to the internet from their machines, even via a proxy, they have a separate RDP machine for that.
And they can only pull in libraries that are scanned via jFrog xRay and ensure the license of said library is "acceptable".
On the plus side, means we're doing something about supply-chain vulnerabilities.
There's a risk that someone uses such a library the wrong way. A big part of the goal of legal compliance and security at large enterprises is to protect staff from doing dumb things that could have bad consequences, and one of the easiest ways to do that is to ban things that are particularly prone to that. It's a blunt weapon, but a more targeted one requires much more work and care.
Nothing prevents it. All they would have to do is loosen the DRM just enough to make just the GPLv3 stuff modifiable. It would be incredibly trivial for Apple to start shipping GPLv3 software, but they are stubborn.
GPLv3 I believe includes language that could be construed to cover your entire software distribution. IOW shipping a GPLv3 thing with the OS puts Apple at a very minor risk that a court could decide that the everything distributed with rsync must also be able to be compiled by the end user.
What do you mean by perfect copies here? Do you mean the file content itself or are you also including the filesystem attributes related to the file in your definition?
A file consists of data and various metadata, e.g. file name, timestamps, access rights, user-defined file attributes.
By default, a file copy should include everything that is contained in the original file. Sometimes the destination file system cannot store all the original metadata, but in such cases a file copying utility must give a warning that some file metadata has been lost, e.g. like when copying to a FAT file system or to a tmpfs file system as implemented by older Linux kernels. (Many file copy or archiving utilities fail to warn the user when metadata cannot be preserved.)
Some times you may no longer need some of the file metadata, but the user should be the one who chooses to loose some information, it should not be the default behavior, especially when this unexpected behavior is not advertised anywhere in the documentation.
The origin of the problem is that the old UNIX file systems did not support many kinds of modern file metadata, i.e. they did not have access control lists or extended file attributes and the file timestamps had a very low resolution.
When the file systems were modernized (XFS was the first Linux file system supporting such features, then slowly also the other file systems were modernized), most UNIX utilities have not been updated until many years later, and even then the additional features remained disabled by default.
Copying like rsync, between different computers, creates additional problems, because even if e.g. both Windows and Linux have extended file attributes, access control lists and high-resolution file timestamps, the APIs used for accessing file metadata differ between operating systems, so a utility like rsync must contain code able to handle all such APIs, otherwise it will not be able to preserve all file metadata.
But what you're referring to here are the attributes that the file system stores about the file, not the file itself. By default I wouldn't expect a copy of a file to have identical file system attributes, just an identical content for the file. I would expect some of the file system attributes to be copied, but not all of them.
Take the file owner for example if I take a copy of a file then by default I should be the owner of that file as it's my copy of the file, and not the original file owner's copy.
An alternative way of looking at it is if I have created a file on my local machine that's owned by root and has the setuid bit set on it's file permissions then there's no way that I should be able to copy that file up to a server with my normal user account and have those atttibutes still set on the copy.
> But what you're referring to here are the attributes that the file system stores about the file, not the file itself.
Yes. Sometimes you need that additional information too. And if you do, then rsync is your tool. If you only need the data stored in the file, then drag & drop suffices.
"File" means an entry in the file system, and so includes the metadata. It is not only the data.
When a copy a file you will be the owner because the new copy is your copy. Other attributes however like modification date for example will remain the same. It's not as if you wrote the contents of the file anew, especially not for copy-on-write architectures like Apple's APFS.
I expect all of them to be copied except for specifically the owner and group. Created date, modified date, ACLs, extended attributes, eeeverything else.
My expectations are more specific than "not all of them", so please don't misrepresent them.
Out of interest, why wouldn't you expect the created timestamp for a file that you've created by copying another file to be the point in time which the copy was made? After all, before that moment the file didn't exist, and after that moment it did.
For some context you may want the new file creation time, but if I copy a folder of some backups for example, I don't want every file to have date set for today. I'll lose the possibility to filter files based on creation date, which is very useful for such use case. I don't remember that I would ever need a copy to have creation date reset.
Most tools that sync files (in contrast to mere copies) need a way to know which files need to be copied, and which can be skiped. The expensive way is to perform a checksum, but most sync tools rely on the creation or modified date unless told otherwise.
Now say Alice and Bob have the same copy of file F, Bob modifies it first which gets stored at timestamp T, then Alice modifies her copy at time T+1.
Bob syncs his files on a filer, its timestamp gets reset to now, which is say T+2. Then Alice does the same, but her file does not get copied, since the remote timestamp T+2 is newer than her local timestamp T+1.
macOS has "date added" for this, which is the date the file was added to its containing folder. It's not the exact same as the date created that you're talking about, though.
I honestly don't have a strong preference either way on this. I don't use date created except for misbehaving media downloaders that think the file modified date is a good place to put the video publication date. I'm sure there's a flag somewhere that I don't care enough to find.
As a counterpoint, many daemons or programs (e.g.: sshd, ssh, slurm, munge to name a few) expect their files to have specific users, groups and modes for security and behavioral guarantees, and flat out refuse to run if these requirements are not met.
When installing these things from archives or moving/distributing relevant files to large fleets, I expect the file contents and all metadata incl. datestamps to be carried the way I want, because all of that data is useful for me and the application which uses the file.
If the user doing the copying has no right to copy the file exactly, I either expect a loud warning or an error depending on the situation.
Should the SELinux context of a file always be copied from the source when moving or copying it? Or should it typically inherit the context defined by policy for the destination directory structure?
For example, copying a file from a user's home directory (perhaps user_home_t) into /var/www/html/ usually requires it to get the httpd_sys_content_t context (or similar) to be served by the webserver correctly and securely. Blindly copying the original user_home_t context would likely prevent the webserver from accessing the file.
Doesn't this suggest that some metadata, specifically the SELinux context, often shouldn't be copied verbatim from the source but rather be determined by the destination and the system's security policy?
What if the tool accessing the file is malicious, and can copy the file, but can't change the context of the said file? SELinux shall be strict on its behavior even if it's a detriment to user convenience.
SELinux contexts shall be sticky, and needs to be manually (re)set after copying.
This is the default behavior, BTW. SELinux contexts are not (re)set during copy operations in most cases, from my experience. You need to change/fix the context manually.
I think when I cp a file it takes on the context of the directory or whatever the default context for that path is supposed to be, and when I mv, it retains the original context.
The cp command does copy the file data but not the metadata. There is a reason we have come up with 2 words to distinguish them.
Rsync only cp the metadata when you specifically ask it to anyway. I haven't had a look at openrsync man page but I would assume it is the same in the case of the later.
Openrsync lacks the options of rsync for making exact copies.
Moreover, the OpenBSD file systems are unable to store all metadata that can accompany files in Linux filesystems or Windows filesystems, so that is the likely reason for removing the rsync options.
I also doubt that the developers of an utility for OpenBSD are also interested in taking care to preserve file metadata when copying to/from Windows, because the metadata access API is not portable, so a complete "rsync" utility must include specific code paths at least for Windows, for Linux and for FreeBSD. I do not know if the API of MacOS is also specific to it, or it is compatible with anything else.
I actively do not want this in a file copy utility, relying on extended file attributes is a massive anti-pattern. If you care about time stamps, they go in the file format itself. If you care about permissions, those belong in the provisioning and access systems in front of the file. The web application or other API that is providing the access.
I expect file attributes of the target to be what I say they should be, not copied over from wherever the content happened to live before.
I think Apple's choice here is less about functionality and more about the allergy that companies have to AGPL and GPLv3. To be clear this is an allergy that the GPL authors intended to create for this very reason.
I personally dislike the GPL because I think my modifications should belong to me. I spent the effort on them, and I don't think any license which requires me to forfeit my effort is worth spending time with. Corporations agree and this is why the code licensed with GPLv2 is sparingly accepted and code licensed with GPLv3 is outright rejected by most large companies.
What effort is "forfeited" (if you are talking about open source code)? If you use any GPL software, any modifications you make belongs to you until you choose to distribute the software with your modifications. Modifying a GPL software, for personal use, doesn't mean that you are obliged to make your modifications also open source. Moreover, GPL also means that you can never be denied access to the source code of a GPL software that is publicly distributed. This is because GPL protects a users "right to repair".
For example, consider the case of a software distributed under a permissive license like MIT or BSD. If you modify and re-distribute it, anyone else can further modify the software that you patched and improved but they are not obliged to release the new source code to you. In such a scenario, you are willing "forfeiting" your effort. With GPL, at least your right to access future source codes (of publicly distributed GPL software) cannot be denied to not just you, but any user of the software.
Is this why bsdtar is popular, even on Linux systems that otherwise use GNU utils? I have often wondered why bsdtar is chosen. You see it in the AUR for example.
Bsdtar certainly has additional features over GNU tar.
I have switched to bsdtar many years ago precisely because I have discovered that at least at that time it was the only Linux utility that could make exact archives for the backup of my filesystems.
I make extensive use of extended file attributes. For instance any file on my filesystems has a hash stored in an extended attribute for detecting errors/modifications in its content and for deduplication (the hash is updated whenever the file is modified intentionally).
When I make backups, I always store at least two copies on different media and it is essential that the file hashes in extended file attributes are preserved by the archiving/backup program, so that I will be able to detect corrupted files if I try to restore them. If some file is corrupted, I can retrieve it from the other backup copy. This has saved me several times with archives stored for many years, because neither HDDs nor any other available archival media are currently reliable enough to trust them for long-term storage without errors.
Like I have said, for modern backups one must use the pax file format. GNU tar, and also other "tar" programs have made some custom non-standard extensions to the standard "tar" file format, in order to be able to store some things not allowed in standard tar files, but those workarounds are inferior to what can be done in the more recent "pax" file format.
In my opinion, any program that is supposed to copy files, but which is not able to make perfect copies, i.e. copies that do not lose any bit of data or metadata that was present in the original file, is just unusable garbage.
Unfortunately, most copying programs available in UNIX-like operating systems (and also many archiving programs) do not make perfect file copies with their default options and many of them are never able to make perfect copies, regardless what options are used.
I have not looked recently at the scp command of ssh, but at least until a few years ago it was not possible to make perfect file copies with scp, especially when the copies were done between different operating systems and file systems. That is why I never use scp, but only rsync over ssh.
Rsync is the only program that I have seen, which is able (with the right options) to make perfect file copies even between different operating systems and file systems (for instance between FreeBSD with UFS and Linux with XFS), preserving also metadata like extended file attributes, access control lists and high-precision file timestamps (some copying programs and archiving programs truncate high-precision timestamps).
The current documentation of openrsync does not make any guarantee that it can make complete file copies, so by default I assume that it cannot, so for now it is a program that I consider useless.
Beside rsync for copying, one of the few Linux archiving programs that can archive perfect file copies is bsdtar (when using the pax file format; the ancient tar and cpio file formats cannot store all modern file metadata).
(FYI: I always alias rsync to '/usr/bin/rsync --archive --xattrs --acls --hard-links --progress --rsh="ssh -p XXX -l YYYYYYY"')
(With the right CLI options, "cp" from coreutils can make perfect file copies, but only if it has been compiled with appropriate options; some Linux distributions compile coreutils with wrong options, e.g. without extended file attributes support, in which case "cp" makes only partial file copies, without giving any warnings or errors.)