DDrescue parallel operation

Question

I’m using ddrescue to recover data from a Seagate Barracuda 3TB drive. The drive is failing, but so far every sector I try to read eventually returns the correct data, but it might take some probing (this means ddrescue has to do multiple passes on the last stage, where bad sectors are read).

Normal operation is very slow, though. I have some stretches on the disk that are read at full speed (60MB/s), but after successfully getting ~2.5TB of data, the remaining 500GB are spread throughout the disk and are read at a breakneck speed of ~2KB/s, with an estimate of a few thousand days to complete.

I can, however,run multiple instances of ddrescue simultaneously on the same drive, which increases throughput, but I’m not sure how to combine the data ultimately into one image, especially by keeping track with the map file. Multiple processes means multiple map files, I assume.

Also, anybody know why the drive is so slow? I mean, 2KB/s (or less, in case of errors) is painstakingly slow, brings back memories of C64. It took me 3 hours to get 30MB of data. I would have an identical Barracuda 3TB drive that could function as an organ donor, if by chance changing the controller would mitigate the problem (but from reading up on this, it’s doubtful this will work).

Does running multiple instances actually increase throughput? Or does it just seem faster since the slow reads are getting obscured by the faster reads? It's hard for me to imagine a case where having multiple readers on a single spinning disk would actually result in higher read speeds? It seems you'd actually be slowing things down by adding in more head seek? — ernie, Aug 01 '17 at 23:21
I’ll have to test this, but I’m not sure the disk is doing much head movement when only reading stuff at 2KB/s. Of course, if that is because it has to re-read the sector internally this often, that might be the case, and seeking to different places would make the speed worse... — Ro-ee, Aug 01 '17 at 23:43
Yeah, my point is even with a perfectly operating disk, I don't know how two simultaneous reads can ever be faster than a single reader. — ernie, Aug 01 '17 at 23:45
IF (and that’s a big if) the slow reading speed is not caused by the read itself, but something else in the drive (remember, the drive is about to fail, smart data is giving me the creeps), then simultaneous reads might be faster, as the physical read /per se/ will be done in the same amount of time, but the remaining data processing would take the rest of the time. That’s the only reason I can come up with, but tests will have to show that. I know that when I limit the size with ddrescue to a 30MB stretch, it takes about 3 hours. So I can do this in parallel and see it it takes 3 or 6 hours.. — Ro-ee, Aug 02 '17 at 00:16
Simultaneous reads always introduce head seek, slowing things down. It seems you're imagining a scenario where the bottleneck is that it reads something off the platter, and then there's some processing or something that is extremely slow, and that's the bottleneck? That seems really, really unlikely. With a failing drive, I'd be really reluctant to do parallel readers as you're introducing extra wear and tear on the drive with the additional head seeks. Instead of using dd or similar, I might just try to copy the files I absolutely need directly. — ernie, Aug 02 '17 at 00:21
@Kamil Maciorowski: unfortunately, smartctl returns "SCT Error Recovery Control command not supported." — Ro-ee, Aug 04 '17 at 12:08
@ernie, parallel reading does not improve speed, as my measurements have shown (unfortunately). — Ro-ee, Aug 04 '17 at 12:09

score 0 · Answer 1 · answered Aug 01 '17 at 23:10

Instead of complicating things with two images, you can tell GNU ddrescue to skip the slow parts and come back to them later.

The flag that lets you do this is --min-read-bytes=.

From the GNU ddrescue manual:

--min-read-rate=bytes

Minimum read rate of good non-tried areas, in bytes per second. If the read rate falls below this value, ddrescue will skip ahead a variable amount depending on rate and error histories. The skipped blocks are tried in additional passes (before trimming). If bytes is 0 (auto), the minimum read rate is recalculated every second as (average_rate / 10).

If you insist on making multiple images, the manual also has an example on how to combine them:

Example 4: Merge the partially recovered images of 3 identical DVDs using their mapfiles as domain mapfiles.

 ddrescue -m mapfile1 dvdimage1 dvdimage mapfile
 ddrescue -m mapfile2 dvdimage2 dvdimage mapfile
 ddrescue -m mapfile3 dvdimage3 dvdimage mapfile
   (if bad-sector size is zero, dvdimage now contains a complete image
    of the DVD and you can write it to a blank DVD)

I’m already using -a (= --min-read-bytes), which bails out at 20000 bytes/s. I have it running multiple times at pass 1 (I played around with resetting the 'current' pointer in the map file, as I found there were still larger stretches skipped over, which thus far helped me getting ~10GB of additional data in the last hours. But eventually, the large areas will all have been worked upon, and I’m left with 30MB non-tried areas, northward of 50k of them.
Thanks for the merge tip, though. — Ro-ee, Aug 01 '17 at 23:47

DDrescue parallel operation

1 Answers1