shell, perl, performance, parallelism, profiling, etc. (was: Upgrade guidance)
Bill McGonigle
bill at bfccomputing.com
Tue Oct 21 16:24:36 EDT 2008
- Previous message: shell, perl, performance, parallelism, profiling, etc. (was: Upgrade guidance)
- Next message: shell, perl, performance, parallelism, profiling, etc. (was: Upgrade guidance)
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
On Oct 21, 2008, at 13:56, Ben Scott wrote:
> On Tue, Oct 21, 2008 at 12:05 PM, Bill McGonigle
> <bill at bfccomputing.com> wrote:
>> I think we need keywords/tags for 'man -k' to use.
>
> Well, there's "man -K", but it's sloooow. I typically use Google as
> a substitute, but that failed me in this case. I suspect I'd still
> fail though, even with a better "man", because the problem was I
> wasn't using a matching keyword. GIGO.
Yeah, but tagging is nice in that you don't have to have a direct
word match in the page and that tags are easy to index. Would I be
the first to think of comm as performing set operations? I'm a bit
fuzzy on how long man(1) should survive as a set of roff documents
too, it has no idea what an Internet is.
>> Is it because the shell tools
> fork, and children aren't counted?
If that were so my shell numbers should be lower.
>> My box is FC8, 2.8 GHz Pentium D dual core, 1 GB RAM. /var/lib/
>> rpm is 106 MB.
$du -sh /var/lib/rpm
160M /var/lib/rpm
$rpm -qa | wc -l
1715
$uname -a
Linux dhd.bfc 2.6.26.3-29.fc9.i686 #1 SMP Wed Sep 3 03:42:27 EDT 2008
i686 i686 i386 GNU/Linux
$free
total used free shared buffers
cached
Mem: 1945152 1854348 90804 0 604852
855912
-/+ buffers/cache: 393584 1551568
Swap: 2096376 1840 2094536
model name : Intel(R) Pentium(R) 4 CPU 2.50GHz
stepping : 9
cpu MHz : 2500.226
cache size : 512 KB
> Hmmm, maybe the dual core also means multiple processes can run
> concurrently, while Perl is serializing everything?
ah, or perhaps time doesn't hop cores? The numbers in your Perl run
add up better than on your shell run. Perl is poor at SMP (gah! perl
threads!).
> I would actually suspect sort(1). There's no really good way to do
> a sort; just minimally poor ways (and lots of really poor ways). And
> comm(1) only works on sorted files; otherwise, we wouldn't need to
> sort. Perl, on the other hand, can use hashes (unsorted), which are
> typically much faster. If we had a tool like comm(1) that used
> hashes, I suspect I'd see a further win.
Good point. You should be able to hack my perl script to do that in
about 10 minutes. :)
> comm -1 -2 \
> <( package-cleanup --orphans | tail -n +2 | sort ) \
> <( package-cleanup --leaves --all | tail -n +2 | sort )
>
> Why do a pattern match when you can just skip the first line, right?
> Except that yields a slightly *slower* typical performance for me,
> but now the *user* counter is showing up!
>
> real 0m4.970s
> user 0m5.935s
> sys 0m0.735s
>
> What the heck??
Hrm, could there be something about the 'tail' pipe that causes CPU
affinity? I have no idea how SMP scheduling really works in linux.
-Bill
-----
Bill McGonigle, Owner Work: 603.448.4440
BFC Computing, LLC Home: 603.448.1668
bill at bfccomputing.com Cell: 603.252.2606
http://www.bfccomputing.com/ Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf
- Previous message: shell, perl, performance, parallelism, profiling, etc. (was: Upgrade guidance)
- Next message: shell, perl, performance, parallelism, profiling, etc. (was: Upgrade guidance)
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the gnhlug-discuss
mailing list