Really old /proc weirdness?

Kenny Lussier klussier at gmail.com
Thu Mar 11 09:25:46 EST 2010


Hi all,

I have the unfortunate need to reproduce a server that was built 6
years ago, and make them identical. The server is RHEL3 i386. I have
managed to get the boxes to an identical state at the OS and package
level, and everything seems to work. However, there is one thing that
has me puzzled. On the original box, when a child process is forked,
it is hidden from `ps`. In one case, if I do a `ps auxww | grep
splunk`, I get:

root      2933  0.2  0.3 70656 29692 ?       S    Mar10   2:25 splunkd
-p 9998 start
root      2934  0.0  0.0 17756 6216 ?        S    Mar10   0:01 splunkd
-p 9998 start
root      2161  0.0  0.0  3696  672 pts/0    S    09:05   0:00 grep splunk

But if I look in `top`, I see:

 2933 root      15   0 29692  28M  8444 S     0.0  0.3   0:01   1 splunkd
 2934 root      15   0  6216 6216  5492 S     0.0  0.0   0:01   2 splunkd
 2935 root      15   0 29692  28M  8444 S     0.0  0.3   0:00   2 splunkd
 2936 root      23   0 29692  28M  8444 S     0.0  0.3   0:00   2 splunkd
 2937 root      15   0 29692  28M  8444 S     0.0  0.3   0:04   0 splunkd
 2938 root      15   0 29692  28M  8444 S     0.0  0.3   0:27   2 splunkd
 2939 root      25   0 29692  28M  8444 S     0.0  0.3   0:00   0 splunkd
 2940 root      25   0 29692  28M  8444 S     0.0  0.3   0:00   0 splunkd
 2941 root      15   0 29692  28M  8444 S     0.0  0.3   0:01   1 splunkd
 2942 root      15   0 29692  28M  8444 S     0.0  0.3   0:01   3 splunkd
 2944 root      15   0 29692  28M  8444 S     0.0  0.3   0:02   2 splunkd
 2951 root      15   0 29692  28M  8444 S     0.0  0.3   0:00   0 splunkd
 2952 root      15   0 29692  28M  8444 S     0.0  0.3   0:00   1 splunkd
 2953 root      25   0 29692  28M  8444 S     0.0  0.3   0:00   0 splunkd
 2956 root      15   0 29692  28M  8444 S     0.0  0.3   1:38   2 splunkd
 2957 root      15   0 29692  28M  8444 S     0.0  0.3   0:02   0 splunkd
 2958 root      15   0 29692  28M  8444 S     0.0  0.3   0:00   0 splunkd
 2959 root      15   0 29692  28M  8444 S     0.0  0.3   0:00   1 splunkd
 2961 root      15   0 29692  28M  8444 S     0.0  0.3   0:00   0 splunkd
 2962 root      15   0 29692  28M  8444 S     0.0  0.3   0:01   0 splunkd

In /proc, all of the pids except for 2933 and 2934 exist, but they are . files:


.2935/
.2936/
.2937/
.2938/
.2939/
.2940/
.2941/
.2942/
.2941/
etc....

I have read up on this, and I understand group leaders, and group
member non-leaders. The weirdness comes in on the new system. Exact
same kernel, package-for-package identical to the first. The
difference is that there are no .pid files in /proc, and ps shows
every child:

[root@ root]# ps auxww | grep splunk
root      4271  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4272  0.0  0.0 17904 6196 ?        S    08:00   0:00 splunkd
-p 9998 start
root      4273  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4274  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4275  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4276  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4277  0.0  0.3 62352 30912 ?       S    08:00   0:01 splunkd
-p 9998 start
root      4278  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4279  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4280  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4281  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4283  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4284  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4285  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4286  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4289  0.1  0.3 62352 30912 ?       S    08:00   0:08 splunkd
-p 9998 start
root      4296  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4297  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4298  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4300  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start
root      4301  0.0  0.3 62352 30912 ?       S    08:00   0:00 splunkd
-p 9998 start

Does anyone with a better understanding of the 2.4 ( Linux
2.4.21-47.ELsmp #1 SMP Wed Jul 5 20:38:41 EDT 2006 i686 i686 i386
GNU/Linux) kernel understand why there is a difference on two
seemingly identical systems?

TIA,
Kenny


More information about the gnhlug-discuss mailing list