Internet history (was: We need a better Internet)
Mark Komarinski
mkomarinski at wayga.org
Thu Apr 8 15:28:24 EDT 2010
On 04/08/2010 03:13 PM, Kevin D. Clark wrote:
> Tom Buskey writes:
>
>
>> On Thu, Apr 8, 2010 at 10:42 AM, Kevin D. Clark wrote:
>>
>
>
>>> The problem that I had was that I frequently had to deal with the
>>> situation of "this particular problem only really efficiently runs on
>>> 1, 4, or 16 nodes in the cluster" or "this problem only really
>>> efficiently runs on 1, 2, 4, 8, or 16" nodes in the cluster"....now,
>>> what nodes were these again, and how do I relate all of the logfiles
>>> that I obtained from the last program run?
>>>
>>>
>>>
>> You might have proven my point.
>>
>
> Just to be clear, I was trying to illustrate your point, because you
> an I appear to be in complete agreement on this issue.
Maybe I'm not understanding the issue, but isn't the above why queuing
systems were made? We're using a dirt-old version of Platform LSF and
it already solves the 'running on heterogeneous systems distributed
across an arbitrary number of nodes' problem. While returning the
output via LSF or shared filesystem.
The original problem ($DWARVES) had to do with doing what really looks
like sysadmin-type stuff, which dsh already can do. It has the notion
of groups so you can have Solaris-specific commands sent to the group of
Solaris systems, Red Hat-specific to Red Hat, etc. or have a group that
includes all hosts for commands that works across everything. You can
have dsh dispatch commands concurrently rather than serially that the
for loop does. We can get ~200 nodes updated via systemimager in only a
few minutes using this method.
-Mark
More information about the gnhlug-discuss
mailing list