Packing/unpacking binary data in C - doubles, 64 bits

Lloyd Kvam python at venix.com
Fri Sep 11 10:33:10 EDT 2009


On Thu, 2009-09-10 at 17:36 -0400, Ben Scott wrote:
>   Just for the sake of example: Bruce said 160 MB of data.  Let's
> assume it's all 4-byte integers.  That's roughly 42 million integers.
> Calling sprintf() and sscanf() 42 million times is going to slow
> things down.  Likewise, if we assume a newline separated format and
> all significant digits used, an ASCII representation is going to use
> 11 bytes per integer, turning 160 MB into 440 MB.
> 

The ASCII is triple the binary in size.  That could be bearable in most
situations.  It should also compress fairly well.  

The conversions are trivial to code and run in your favorite scripting
language.  I used Python (see below).  Round trip time for 10 million
floats (doubles) was about 45 seconds.  Integers would have been
quicker.  Presumably C would be faster, at the cost of a bit more code
and complexity.  So Bruce would be looking at about 3 minutes of
processing time if his hardware matched mine.

I'm not second guessing Bruce's decision here.  It's all about getting
the most out of your time using the available tools.

>>>> Python Code >>>>>>
In [11]: m10 = 10 * 1000 * 1000
# easier on the eyes than a long list of 0s

In [12]: m10
Out[12]: 10000000

In [17]: f_list = [random.random()*20 for x in xrange(m10)]
# force some of the random numbers to be greater than 1

In [24]: now();s_list = map(repr, f_list);now()
Out[24]: datetime.datetime(2009, 9, 11, 10, 12, 22, 549050)
Out[24]: datetime.datetime(2009, 9, 11, 10, 12, 54, 261281)
# created 10,000,000 float strings in 32 seconds

In [25]: now();f2_list = map( float, s_list); now()
Out[25]: datetime.datetime(2009, 9, 11, 10, 13, 11, 215100)
Out[25]: datetime.datetime(2009, 9, 11, 10, 13, 24, 218123)
# converted 10,000,000 strings to float in 13 seconds

In [26]: f_list[:10]
Out[26]: 
[3.2547270222254054,
 4.1187838723903596,
 19.029531987086656,
 14.980165347124705,
 2.1337003969489698,
 8.2395337150073527,
 4.7579966946618608,
 0.88969361970157923,
 9.5651010251147905,
 16.707563948930382]

In [27]: f2_list[:10]
Out[27]: 
[3.2547270222254054,
 4.1187838723903596,
 19.029531987086656,
 14.980165347124705,
 2.1337003969489698,
 8.2395337150073527,
 4.7579966946618608,
 0.88969361970157923,
 9.5651010251147905,
 16.707563948930382]

In [28]: from itertools import izip

In [29]: any(f1-f2 for (f1,f2) in izip(f_list, f2_list))
Out[29]: False
# all differences were 0 so the round trip processing was correct
<<<<<<< end of python code <<<<<<<<

-- 
Lloyd Kvam
Venix Corp
DLSLUG/GNHLUG library
http://dlslug.org/library.html
http://www.librarything.com/catalog/dlslug
http://www.librarything.com/rsshtml/recent/dlslug
http://www.librarything.com/rss/recent/dlslug



More information about the gnhlug-discuss mailing list