Splitting tar output, Resolved

Jim Kuzdrall gnhlug at intrel.com
Wed Aug 31 14:07:01 EDT 2005


> logic to time out, forcing it to write a bunch of padding, stop,
> reposition, get a running start, write a bunch of preamble junk,
> write one more minimum size block, underrun, write more padding,
> reposition, etc...

    I appreciate that, but I want to explore the threshold so I can 
be assured of an adequate safety margin.

    My first test wrote the 9.4GB uncompressed tar output to both disk 
and tape in 8392s, a rate of 1.1MB/s.

    For those who remember the Kansas City Standard audio tape system, 
the fidelity of these systems is miraculous.  I pulled the 9.4GB 
archive off as a tar file using dd and did cmp with the original.  Not 
a single bit out of place!

    The second test wrote the file thus created to the tape again, this 
time using only dd.  It took 9581s, a rate of 980KB/s.  Within the 
vagaries of measurement, these are equal.  So, tar produces and writes 
the file fast enough to keep up with the tape.

    The next check writes the tar file without any output to tape.  That 
takes 900 seconds, so tar is faster by a margin of about 10.

    The split output script has evolved to:

mt -f /dev/st0 setblk 4096
cd /
(tar -cf - ./ $bkoptions) | (tee $bkdir/monthly.tar | dd of=/dev/st0 
ibs=512 obs=16384)

    I should explain the hardware, which affects the results which 
follow but not the method:
        Motherboard: MSI K7-Master
        Processor: dual Athon 2400 (2.0GHz clock)
        RAM: 2GB, 133MHz bus
        Kernel: 2.6.11.4-21.8-smp
        Disks: 18GB ultra SCSI 166MHz bus; 82GB Ultra DMA 100 EIDE
        Tape: HP 1554 (aka 1537A), DDS-3, 12GB/24GB, 976kB/s sustained
              uncompressed, up to 1950kB/s compressed, compression on

    The internal write speed of the DAT drive is 1.0MB (976kB).  It is 
capable of pre-write compression (of unknown sophistication), but 
compression of 2:1 is the maximum to be expected.  Many large files 
(jpg, pdf) are already well compressed.  My whole-disk backup size 
decreases by 1.8 with gzip compression.

    Timing test: 'mt -f /dev/st0 $blkt ; dd if=$tstdat of=/dev/st0 
ibs=$blki obs=$blko'

Key... File: .tar 296MB, .tgz 145MB; Time s,  Rate kB/s  

File   $bkt    $blki    $blko    Time    Rate
.tgz    512      512      512     618     234
        512      512     1024     346     420
        512      512     2048     213     681
        512      512     4096     186     780
        512      512     8192     186     780
       1024      512     4096     185     784
       4096      512     4096     185     785
       8192      512     8192     185     785
       1024     1024     4096     186     780
.tar   4096      512     4096     273    1100
       4096      512     8192     249    1200
       4096      512    16384     232    1300
       4096      512    32768     232    1300

    The rate does not appear to meet the spec because the time includes 
the start and rewind.

    A 4096 output block get all there is to be had for a well compressed 
input file.  A 16K block is adequate for an uncompressed file.  
Choosing 32K gives a 2:1 safety margin.  I would think that most recent 
computers will operate dd at the same speed as this one.  They may be 
slower at creating tar files, though, especially if using compression.

    gzip compression beats the internal compression by a factor of 1.5 
or so, neglecting the start-stop contribution.  I have read that the 
milder compression is best for critical backups.  If one bit is off 
with very aggressive compression the damage extends over many more 
bytes.  Is that still accepted wisdom?

    Keeping the blocks at 512 multiple allows use of tar to get a single 
directory or file off tape directly.  (If you remember that -b is 
size/512.)  Otherwise, you must have enough disk space to hold the 
recovered archive before extraction.

    Maybe I made a bigger task of this than absolutely necessary.  But I 
fell better now.

Jim Kuzdrall   


    



More information about the gnhlug-discuss mailing list