Splitting tar output, Resolved
Jim Kuzdrall
gnhlug at intrel.com
Wed Aug 31 14:07:01 EDT 2005
> logic to time out, forcing it to write a bunch of padding, stop,
> reposition, get a running start, write a bunch of preamble junk,
> write one more minimum size block, underrun, write more padding,
> reposition, etc...
I appreciate that, but I want to explore the threshold so I can
be assured of an adequate safety margin.
My first test wrote the 9.4GB uncompressed tar output to both disk
and tape in 8392s, a rate of 1.1MB/s.
For those who remember the Kansas City Standard audio tape system,
the fidelity of these systems is miraculous. I pulled the 9.4GB
archive off as a tar file using dd and did cmp with the original. Not
a single bit out of place!
The second test wrote the file thus created to the tape again, this
time using only dd. It took 9581s, a rate of 980KB/s. Within the
vagaries of measurement, these are equal. So, tar produces and writes
the file fast enough to keep up with the tape.
The next check writes the tar file without any output to tape. That
takes 900 seconds, so tar is faster by a margin of about 10.
The split output script has evolved to:
mt -f /dev/st0 setblk 4096
cd /
(tar -cf - ./ $bkoptions) | (tee $bkdir/monthly.tar | dd of=/dev/st0
ibs=512 obs=16384)
I should explain the hardware, which affects the results which
follow but not the method:
Motherboard: MSI K7-Master
Processor: dual Athon 2400 (2.0GHz clock)
RAM: 2GB, 133MHz bus
Kernel: 2.6.11.4-21.8-smp
Disks: 18GB ultra SCSI 166MHz bus; 82GB Ultra DMA 100 EIDE
Tape: HP 1554 (aka 1537A), DDS-3, 12GB/24GB, 976kB/s sustained
uncompressed, up to 1950kB/s compressed, compression on
The internal write speed of the DAT drive is 1.0MB (976kB). It is
capable of pre-write compression (of unknown sophistication), but
compression of 2:1 is the maximum to be expected. Many large files
(jpg, pdf) are already well compressed. My whole-disk backup size
decreases by 1.8 with gzip compression.
Timing test: 'mt -f /dev/st0 $blkt ; dd if=$tstdat of=/dev/st0
ibs=$blki obs=$blko'
Key... File: .tar 296MB, .tgz 145MB; Time s, Rate kB/s
File $bkt $blki $blko Time Rate
.tgz 512 512 512 618 234
512 512 1024 346 420
512 512 2048 213 681
512 512 4096 186 780
512 512 8192 186 780
1024 512 4096 185 784
4096 512 4096 185 785
8192 512 8192 185 785
1024 1024 4096 186 780
.tar 4096 512 4096 273 1100
4096 512 8192 249 1200
4096 512 16384 232 1300
4096 512 32768 232 1300
The rate does not appear to meet the spec because the time includes
the start and rewind.
A 4096 output block get all there is to be had for a well compressed
input file. A 16K block is adequate for an uncompressed file.
Choosing 32K gives a 2:1 safety margin. I would think that most recent
computers will operate dd at the same speed as this one. They may be
slower at creating tar files, though, especially if using compression.
gzip compression beats the internal compression by a factor of 1.5
or so, neglecting the start-stop contribution. I have read that the
milder compression is best for critical backups. If one bit is off
with very aggressive compression the damage extends over many more
bytes. Is that still accepted wisdom?
Keeping the blocks at 512 multiple allows use of tar to get a single
directory or file off tape directly. (If you remember that -b is
size/512.) Otherwise, you must have enough disk space to hold the
recovered archive before extraction.
Maybe I made a bigger task of this than absolutely necessary. But I
fell better now.
Jim Kuzdrall
More information about the gnhlug-discuss
mailing list