Ignition (was Re: tftp config problem (ltsp))
Ric Werme
ewerme at comcast.net
Sat Nov 10 23:14:23 EST 2007
>>> Did some research and found this sometimes occurs when the
>>> speed of the server nic is so much faster then the client nic.
>>> Apparently this showed up with the 2.6 kernel.
> [...]
>> Could not find where I saw the original write up. But it has to
>> do with the NIC in the server box being 1G and the NIC in the test
>> laptop being only 100MB.
>
>
>I suspect that such mismatches are not the problem, because the
>fact that *any* bits are flowing means that a whole bunch of
>low-level plumbing is working properly.
Not at all! In my NFS on Tru64 Unix days, I was horrified at some of
the abysmal buffering on some top of the line Gigabit Ethernet switches.
A typical box could buffer 350 KB, i.e. 2.8 Mb, and that is about 3 msec
of GbE wire time. Silly me figured a switch could buffer at least a
second's worth of data before discarding data, maybe 100 msec at worst.
I would love to get my paws on the marketroid that suggested people should
sell Gbe for a corporate backbone and servers with 100 Mb to the workstations.
In my case, we typically had 8 48 KB NFS data messages in flight (UDP)
or 8 64 KB when using NFS over TCP. 384-512 KB. Oops. (And I used
1 MB TCP window sizes in part to get around an interesting problem with
BSD's TCP code sharing a socket across multiple threads and in part
to get the "bandwidth x delay" product to get full GbE on a network
with a delay as long as 8 msec. I couldn't bring myself to make it
bigger.)
Given the cost of memory these days, I figured something else must
have been behind the atrocious buffering than RAM, and found a big
clue in a Cisco document that talked about the FIFOs connecting
sections in a GbE switch. Those are basically large, byte at a
time shift registers and far more expensive that DRAM. On the
smallish switches I had, I couldn't find statistics in the boxes
for the number of messages discarded due to congestion. I suspect
that may be a marketing solution to what would have been an obvious
customer support issue.
I never tried buying/borrowing consumer grade switches to abuse, but
I'm sure I'd be appalled.
Last I looked, implementations of TFTP never let much data on the
wire, so that may not be the problem. However, I can imagine a
zillion ways something else could screwup. At any rate, without
good network traces (at both the 1 GbE and 100 MbE sides), all
we can do is speculate where things might be failing.
Using a homogenous speed on a network doesn't work well either at
times. One NFS client reading from two servers can clog its
incoming link just fine.
-Ric Werme
More information about the gnhlug-discuss
mailing list