FWIW: The bigger picture... Or why I have been asking a lot of questions lately...
Bruce Labitt
bruce.labitt at myfairpoint.net
Sun Oct 11 16:46:02 EDT 2009
Jim Kuzdrall wrote:
> Greetings Bruce,
>
> Interesting and challenging project!
>
> On Saturday 10 October 2009 15:20, Bruce Labitt wrote:
>
>> For anyone that is remotely interested, here is the big picture for
>> the problem I'm trying to solve. If you are not interested, hey
>> delete the post. Won't irritate me in the least!
>>
>>
> If you just transferred the data (no framing or error checking), how
> many bits per second must you transfer to keep up with the FFT data
> production?
>
For the problems I'm doing now, the net cannot keep up. At 800Mbps it
would take ~1.6 sec to push the data and the engine computes a ~10M
point complex double precision FFT in ~200ms. 10Gb ethernet would be
nice, but I don't have the budget for this. Even then, the transport
would be 0.16sec vs the 0.2s compute.
> Did you explore adding a dedicated FFT card to your control
> computer? The algorithms they build into the hardware are much, much
> faster than compiled software. The local board would keep the data in
> your control computer - with DMA, I assume - eliminating the transfer
> problem.
>
>
I will look into it again. Maybe the landscape has changed. At one
point I had to do 128M point FFTs - there wasn't any hardware to do that!
> I know a fellow who now works for Apple whose job is to optimize FFT
> algorithms to the processor they use. Assembly language, of course.
> Why is Apple interested? Faster FFT, faster MP3 translation, longer
> battery life. A very high payoff.
>
> Jim Kuzdrall
>
>
>
I am using open source FFTW. It is quite fast and it uses the
platform's assets quite effectively. Fortunately, it has been optimized
for the Cell processor. It runs 50-100X faster on my Cell than on my
3.4GHZ P4, or whatever boat anchor I have. I also tested the software
on a couple of our servers. The ratio is still way up near 50x. The
problem is that the cache gets exhausted and then the memory bus
bandwidth gets saturated, this forms the upper limit of performance for
the P4 / AMD64 class machines.
The problem is indeed quite challenging. I've gone down quite a few
dead ends. The list has seen some of my dead end attempts, but not all
of them :) I spared you some...
Bruce
More information about the gnhlug-discuss
mailing list