Packing/unpacking binary data in C - doubles, 64 bits

Thu Sep 10 17:06:24 EDT 2009

Bruce/Ben,
I've some experience with binary oriented endian issues on about 15
different platforms (Sun, SGI, Intel/AMD Windows PCs, Tandem/Compaq/HP
NonStop 'mainframes', HP UX workstations, Linux, DEC Unix, and several
flavors of Unix that probably do not still exist).  Basically INT signed
or unsigned byte swaps (hton/ntoh for 32 bit and custom for 64 bit) have
always worked as far back as I can remember (about 15 years).  Float and
doubles are ALWAYS a pain as different, non endian related, format
standards exist.

I assume since this is up for debate, that this is a homegrown protocol
and there is no defined and documented 'network standard' format. The
simplest and most obvious answer is pick one (which you seem to be
trying to do).  Which one is a bit more complex as this long winded post
will hopefully show.

Typically integers use big endian format as the network format so go
with that.  When it comes to floats/doubles, I'd suggest you simply pick
 a format as the 'standard' for this protocol and move on.  I've seen
people get hung up on this point delaying schedules for weeks.  If you
are defining the protocol it is hard to be 'wrong' (but easy to be
inconvenient :).  A typical 'standard' floating point network format is
a sign bit followed by N exponent bits followed by M 'mantissa' bits,
see IEEE 754-2008 interchange format for a more 'in depth' discussion.

If you do not mind a custom float encoding as part of the network
format, several come to mind.  A choice I've used in the past is to
define the number of significant decimal digits in my 'network floating
point' format and then use 32 or 64 bit ints, basically a predefined
scaled integer approach.  For example if I define my network floating
point to be 4 digits to the right of the decimal as a format
(xxxxx.xxxx) I can multiply the float by 10000, round up and then
transmit as an integer (the receiver converts the int to host format
then divides by 10000 to get the float back).  This of course only works
if your 'adjusted floats' will always fall into the usable int64 range
and always have a reasonable 'fixed' number of decimal digits, not
always possible but REALLY convenient when it works (some signal
processing apps used to use this).  This method theoretically has the
most significant digit support as all 64 bits generate significant
digits (typically moot as some get tossed due to the fixed scaling).

Another technique is to store as a signed 59 bit int (sign bit + 58
unsigned int bits) and an 5 bit value to define the number of decimal
places, providing something like 15 significant digits split up any
which way about the decimal point (I believe this is a variant of IEEE
754-2008).  This is basically similar to a 'normal C float' but has
defined order independent of the CPU/OS and avoids mantissa conversion
which can get ugly.  Depending on how much you can squeeze the size of
the exponent down (5 bits? 4 bits?), this theoretically has the second
best support for large numbers of significant digits (in practice it may
exceed the first example depending on data ranges).

I'm sure you can come up with several other custom techniques.  The
upside of custom encodings is they can be very space efficient.  The
downside is they may be very custom to the protocol.  This is usually OK
if the protocol is 'internal only' (or hidden behind an API) but can
look somewhat hokey if the protocol is to be published (the fixed value
'scaled int32/64' approach is not unheard of in some sensor manufacturer
docs from years back and the sign + 58 bit int + 5 bit based 10 exponent
is similar to IEEE 754-2008).

It appears you dislike suggestions of 'use ASCII' as they are space
inefficient but yet you seem to want a 'standard'.  Well my experience
tells me that the new 'standard' is ASCII/XML, the old 'standard' is
something along the lines of IEEE 754 (a defined sign+exponent+mantissa
format, you could even use a cell processor float as 'network float'
since it is 'your' standard being defined).  The most space efficient
'network format' is likely to be custom based on knowledge of the
specific application data you are processing (e.g., signal amplitude on
a scale of -100.00 to +100.00 easily fits in an INT16, data values
locked between -.99 and +.99 easily fit in a byte).

So my recommendation is big endian ints, and either IEEE 754-2008
interchange format floats or custom floats if your app REALLY requires
it.  If you are ALWAYS going to use Cell processors, you could even
define the Cell float format as the network format to avoid ntoh and
hton costs on the Cell.

Good luck.

Ben Scott wrote:
>   We keep seeing the recommendation to use highly-portable encodings
> when possible, e.g., ASCII, or some kind of self-descriptive encoding.
>  Which I fully agree is a very good idea.
> 
>   But assume for the sake of discussion we want to keep overhead as
> low as possible for performance reasons, and "wait until computers get
> faster" isn't a practical solution.  What techniques, best practices,
> de facto standards, popular libraries, etc., exist for this sort of
> thing?
> 
>   Obviously, putting unsigned integers into "network byte order" for
> transmission is one such best practice.
> 
>   What about signed integers?  Can one expect hton*() and ntoh*() to
> work for signed integers as well?  IIRC, most machines store signed
> ints in two's-complement format, which I think would survive and work
> properly if swapped to compensate for an endianess change, but I'm not
> sure.
> 
>   What about floating point?
> 
> -- Ben
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
>