FWIW: The bigger picture... Or why I have been asking a lot of questions lately...

Tue Oct 13 08:44:49 EDT 2009

Bruce Labitt <bruce.labitt at myfairpoint.net 
<mailto:bruce.labitt at myfairpoint.net>> wrote:
 >
 > What I'm trying to do:  Optimizer for a radar power spectral density 
problem
 >
 > Problem:  FFTs required in optimization loop take too long on current
 > workstation for the optimizer to even be viable.
 >
 > Attempted solution:  FFT engine on remote server to reduce overall
 > execution time
 >
 > Builds client - server app implementing above solution.  Server uses
 > OpenMP and FFTW to exploit all cores.
[...]
 > Implements better binary packing unpacking in code.  Stuff works
 >
 > Nit in solution:  TCP transport time >> FFT execution time, rendering
 > attempted solution non-viable
 >
[...]
 > Hey, that is my bigger picture...  Any and all suggestions are
 > appreciated.  Undoubtedly, a few dumb questions will follow.  I appear
 > to be good at it.  :P  Maybe this context will help list subscribers
 > frame their answers if they have any, or ask insightful question.

I don't understand anything about your domain of application,
so take this for what its worth...

I've gleaned the following from the previous posts. Is it a fair summary?

- The local FFT is taking ~200 ms, which isn't fast enough.
- The remote FFT is substantially faster than this once the data gets 
there.
- However, it takes substantially longer (~1.2 seconds) to move the data
  than to process it locally.

What does "fast enough" mean here? What is your "time budget" per data set?
Is it only constrained by "catching and cooking" one data set before it is
overwritten by a new one (or before you choke on the stream buffers :) )?
Are there latency/timeliness requirements from downstream?
If so, what are they?
Provided your processing rate keeps up with the arrival rate,
how far behind can you afford to deliver results?
(i.e. how much pipelining is permitted in a solution?)

How fast is the remote FFT? I didn't catch a number for this one.
Or was the 200 ms the remote processing time?
(In which case, what't the local processing time?)
Do you have the actual server you're targeting to benchmark this on?

This helps to frame the external requirements more clearly.

You've stated the problem in the implementation domain.
It sounds like your range of solutions could leave very little headroom.
My instinctive response is to ask
"Is there a more frugal approach in the application domain?"

Do you need to grind down the whole field of potential interest?
Are there ways to narrow and intensify your focus partway through?
Perhaps to do a much faster but weaker FFT,
analyze it quickly to identify a narrower problem of interest,
and then do the slower, much stronger FFT on a lot less data?
Reducing the data load for the hard part may help with on-chip or
off-chip solutions. It may also help to identify hybrid solutions.

Alternatively, a mid-stream focusing analysis might be so expensive
as to negate the benefit, or any performant mid-stream analysis might
be merely a too-risky heuristic, or the problem may simply not lend
itself to that kind of decomposition. You did say that you had already
encountered a number of dead-ends - this may be familiar ground :)

I don't know your domain. I don't have answers, just questions.
I just figured those kind of questions were worth asking
before we try squeezing the last Mbps out of the network...

Lupestro