file size estimation

Bill Ricker bill.n1vux at gmail.com
Fri Sep 15 23:58:01 EDT 2006


> Suppose a EBCDIC file on a tape from IBM mainframe is read onto a Linux
> server, and this EBCDIC file on the tape has 100 records with a length
> of 13054, is it correct to estimate the size of the file on Linux server
> would be 1,305,400 bytes?

Maybe.

[Last time I did this, I out-sourced it to a boutique conversion shop
in Cambridge ... he had a VMS system with one of every tape drive
known to man, and set a custom conversion table since the tapes I had
were Mutant International EBCDIC from NLM. Sorry, I don't have name
handy, this was 10 years ago.]

> Is block size information also needed to
> calculate the size?

Probably not, although it will probably be necessary to read the tape,
depending on utility used. eg., dd(1) will require being told
blocksize and lrecl.

> Please correct me if these terms are used incorrectly, also hopefully
> this question is not too OT.

Terminology seems correct.

Normally, if doing EBCDIC=>ASCII conversion for use on Unix later, I
would also do an LRECL=>NL. This would insert an additional 100 bytes
beyond the size computed in your example.  If you really only plan to
use the file with sysread(2) as LRECL, you don't need to do this, but
to view it with more(1) or anything else, it's highly desireable, even
though the lrecl is rather long by Unix standards and will crush any
old fixed 1000 byte buffers.

(If by some disaster you convert it into Unicode, the size could be a
bit larger due to non-ASCII characters appearing in the EBCDIC, or
doubled if you convert it to UTF16. I wouldn't recommend that unless
you had compelling reasons!)

-- 
Bill
n1vux at arrl.net bill.n1vux at gmail.com



More information about the gnhlug-discuss mailing list