> Make a sparse image with dd. Then you have a 1:1 copy while skipping all the unused sectors to save space.
Filesystems don't zero out deleted data and dd isn't aware of the filesystem mapping, so unless it's a completely fresh drive, you'll still pull off garbage data.
ddrescue is one tool that combines dd with the ability to read the filesystem metadata to only extract out the allocated filesystem blocks.
Sorry, you're right, I definitely meant ddrescue. That's what I always use, no idea why I wrote dd. Especially since ddrescue -d -n -S /dev/sdb sdb.raw sdb.logfile is permanently burnt into my brain. I guess I just took it from the previous comment and article and didn't give it a second thought before posting my comment.
> Filesystems don't zero out deleted data and dd isn't aware of the filesystem mapping, so unless it's a completely fresh drive, you'll still pull off garbage data.
Sure, but if there's unused space left zeroed, such as partitions that you didn't use, then you get to skip it. It's not a huge priority but it doesn't hurt.
>Filesystems don't zero out deleted data and dd isn't aware of the filesystem mapping, so unless it's a completely fresh drive, you'll still pull off garbage data.
That may have been true a very long time ago.
Today, some filesystems such as ZFS can effectively, and automatically, zero out deleted data using trim on devices that support that. (I say "effectively" here because we don't actually know if the data is literally zero'd by the underlying device, and I say this even though that distinction doesn't actually matter in this context. Once trimmed, those logical sectors will always read as zeros until something new is written to them, and this is the functionality that is important in this context.)
This function is useful for SSDs, and is also useful for SMR spinny-disks.
Tons of other combinations of filesystems and operating systems also deal quite well with trimming of unused space, though this more-often happens as a scheduled task instead of something that is taken care of by the filesystem itself.
Trim (in various implementations) has been broadly used for well over a decade, and a trim'd device can lead dd to be able to produce sparse files.
---
Now, that said: It probably doesn't matter much if a particular dd-esque tool is set to create sparse output files or not. Sure, some space may be saved, and sparse files sure are cute and cuddly.
But it's probably a fool's errand to even plan such an operation on a machine that has less free space than the total maximum capacity of the thing being rescued: Either there's enough room to write a non-sparse image, or there isn't enough room to even think about starting the process since it might not be able to complete.
(If space becomes an issue later on down the road, the output file can be "sparsified" in-place using "fallocate --dig-holes" in instances where that makes sense.)
And I definitely want the whole disk imaged, which means that I definitely do not want ddrescue's interpretation of metadata to determine filesystem allocation and limit the scope of that image: This is the first step of a data rescue operation, and that makes it the worst place for data to be intentionally thrown away or disregarded.
If things are failing hard enough that any of this work is on the table, then obviously the combination of the source disk and filesystem is untrustworthy -- along with the metadata.
Getting all of the bits backed up -- regardless of their apparent lack of importance -- should always be the prime directive here. Any extra bits can always be tossed later if they're eventually deemed to be actually-unimportant.
There is no purpose in sending TRIM after each delete: at worst (with a naive implementation) you would get a solid write amplification and at least your drive would be doing GC instead of serving the data.
Most of the time it's just sent every once in a while, often triggered by the schedule and/or the amount of writes[0].
[0] my current machine says it was 9 days since 'the last retrim' and T440 which 'works' as a glorified dashboard (ie minuscule writes overall) says it's 24 days.
I'm pretty sure that OSes usually send TRIM right away and let the drive figure out optimization. Wikipedia says "some distributions" of Linux turn it off but my various ubuntu-ish systems have `discard` or `discard=async` in the output of `mount`.
"last retrim" on Windows is an extra feature, because TRIMs can get dropped when a drive is busy enough. It goes through all the free space and TRIMs it again once a month.
Also even if you did only TRIM once a month, I think most of your free space would still be zero.
Filesystems don't zero out deleted data and dd isn't aware of the filesystem mapping, so unless it's a completely fresh drive, you'll still pull off garbage data.
ddrescue is one tool that combines dd with the ability to read the filesystem metadata to only extract out the allocated filesystem blocks.