Internet history (was: We need a better Internet)

Thu Apr 8 15:28:24 EDT 2010

On 04/08/2010 03:13 PM, Kevin D. Clark wrote:
> Tom Buskey writes:
>
>   
>> On Thu, Apr 8, 2010 at 10:42 AM, Kevin D. Clark wrote:
>>     
>
>   
>>> The problem that I had was that I frequently had to deal with the
>>> situation of "this particular problem only really efficiently runs on
>>> 1, 4, or 16 nodes in the cluster" or "this problem only really
>>> efficiently runs on 1, 2, 4, 8, or 16" nodes in the cluster"....now,
>>> what nodes were these again, and how do I relate all of the logfiles
>>> that I obtained from the last program run?
>>>
>>>
>>>       
>> You might have proven my point.  
>>     
>
> Just to be clear, I was trying to illustrate your point, because you
> an I appear to be in complete agreement on this issue.
Maybe I'm not understanding the issue, but isn't the above why queuing 
systems were made?  We're using a dirt-old version of Platform LSF and 
it already solves the 'running on heterogeneous systems distributed 
across an arbitrary number of nodes' problem.  While returning the 
output via LSF or shared filesystem.

The original problem ($DWARVES) had to do with doing what really looks 
like sysadmin-type stuff, which dsh already can do.  It has the notion 
of groups so you can have Solaris-specific commands sent to the group of 
Solaris systems, Red Hat-specific to Red Hat, etc. or have a group that 
includes all hosts for commands that works across everything.  You can 
have dsh dispatch commands concurrently rather than serially that the 
for loop does.  We can get ~200 nodes updated via systemimager in only a 
few minutes using this method.

-Mark