Filesystem overhead

Tue Jul 29 21:58:49 EDT 2003

Hello world!

  Okay, I have satisfied my curiosity in this matter.

  Bill Freeman <f at ke1g.mv.com>, who replied to me off-list quickly after my
original post, was correct in that the overhead I was seeing is due to
indirect blocks.  Credit also to Derek Martin <gnhlug at sophic.org>, for
providing very nice empirical evidence (and saving me the trouble of
producing it).

  Once I knew what I was looking for, finding it with Google was easy enough
to do.  ;-)

  A good brief explanation:

	An inode stores up to 12 "direct" block numbers, summing up
	to a file size of 48 kByte. Number 13 points to a block with up to
	1024 block numbers of 32 Bit size each ("indirect blocks"), summing
	up to a file size of 4 MByte. Number 14 points to a block with
	numbers of blocks containing numbers of data blocks ("double
	indirect", up to 4 GByte); and number 15 points to "triple indirect"
	blocks (up to 4 TByte).

		-- from http://e2undel.sourceforge.net/how.html

  A good diagram:

	http://e2fsprogs.sourceforge.net/ext2-inode.gif

  Additional references:

	http://www.nongnu.org/ext2-doc/
	http://e2fsprogs.sourceforge.net/ext2intro.html

  Finally, Mr. Freeman's reply to my original post was sufficiently
informative and well-written then I asked for and received his permission to
repost it here (thanks Bill!).  His post is about UFS (the original(?) Unix
File System), but most of the concepts (if not the numbers) apply to EXT2/3
as well.

---------- Begin forwarded message ----------
Date: Mon, 28 Jul 2003 13:24:16 -0400
From: Bill Freeman <f at ke1g.mv.com>
To: bscott at ntisys.com
Subject: Filesystem overhead

Ben,

	I'd guess that du is counting the indirect blocks, except then
the overhead that you see is too small, unless things have gotten a
lot better than early Unix days.  Actually, they probably have gotten
better, having some scheme to allocate most of a large file from
contiguous sets of blocks that only needs a single pointer in an inode
or indirect block.  But whatever the allocation unit is, you need at
least an index into the allocation space plus an indication of which
space, and more likely a block offset within the filesystem, for each
unit of data.  If 32 bit offsets are enough (maybe not for the new
extra large filesystems), then to see the approximate 0.1% overhead
you're describing would need 4k allocation units, which seems
reasonable to me.

	Actually, I assume that you know the stuff below, but I'm
going to say it anyway.  This is all from UFS: I've never studied extN
stuff internally.  In old Unix systems, blocks were 512 bytes.  An
inode was a block, and after things like permissions, size, fragment
allocation in the last block, owner, group, etc., there was room for
13 disk pointers (index of block within the partition).  The first 10
of these were used to point to the first 10 data blocks of the file.
If a file was bigger than 5k (needed more than 10 blocks), then the
11th pointer pointed to a block that was used for nothing but
pointers.  With 32 bit (4 byte) pointers, 128 pointers would fit in a
block, so this "single indirect block" could handle the next 64k of
the file.  If the file was larger than 69k (more than fills the single
indirect block), then the 12th pointer in the inode points to a
"double indirect block", a block of pointers to blocks of pointers to
blocks of data.  In the 4 byte pointer 512 byte block world, this
handles the next 8Mb of the file.  Finally, if the file was too big
for that, the last inode pointer pointed to a triple indirect block, a
pointer to a block of pointers to blocks of pointers to blocks of
pointers to data blocks.  That handled the next 1Gb of the file.

	This size comfortably exceeds the wildest dreams of a PDP-11,
the original implementation platform for Unix.  The washing machine
sided drives of the day only held between 2.5Mb and 10Mb.  Even when
we started being able to get 40Mb drives the choice wasn't a concern.
By the mid 1980's, however, big system vendors (I was at Alliant, who
made so called mini-super-computers) were scrambling to find creative
ways to expand the limits on both filesystems and individual files
without breaking too many things.

	Linux has clearly been using 1k blocks, and I wouldn't be
surprised by the allocation of 4 block clusters for all but the last
(fragmented) blocks.  One to one thousand overhead to data sounds
pretty reasonable to me.

							Bill
---------- End forwarded message ----------

  Thanks to everyone who responded.  I hope other people have found this
thread as informative and useful as I have.

  Clear skies!

-- 
Ben Scott <bscott at ntisys.com>
| The opinions expressed in this message are those of the author and do  |
| not represent the views or policy of any other person or organization. |
| All information is provided without warranty of any kind.              |