John Pollard
2006-07-25 17:03:32 UTC
List,
We have had similar problems in the past, but this seems to be
happening more often now:
Web site performance becomes erratic (some instances not responding)
Viewing JavaMonitor one or two instances show 0 in the Transactions
column, even though you know that is not the case (there will have
been hundreds or thousands)
Our WO app instances are all within the port range of 2000 - 2050:
Running:
/usr/sbin/lsof -i tcp:2001-2050 -P
gives some ok LISTEN lines + some problem lines like this:
java 553 root 20u IPv6 0x04bbcd68 0t0 TCP plug:2025-
cleared.
Casting my lsof net a bit wider:
/usr/sbin/lsof -i tcp:2001-2050 -P
gives about 10 lines like this:
java 281 root 9u IPv6 0x04bba470 0t0 TCP [::127.0.0.1]:
49228->[::127.0.0.1]:3306 (CLOSE_WAIT)
java 569 root 9u IPv6 0x0616dd70 0t0 TCP [::127.0.0.1]:
49247->[::127.0.0.1]:3306 (CLOSE_WAIT)
These all seem to be wotaskd processes. Are there meant to be 10
wotaskd processes running?
I tried sending a kill -QUIT signal to one of my hung WO instances to
force a stack trace, but nothing came out in the log file. I have now
read the posting about editing SpawnOfWotaskd.sh to get the log
output, so will do this for next time.
As an aside, we try to use lsof every night in a script to detect
these problems and reboot, but lsof sometimes returns absolutely
nothing or "unable to read process table" or something close to that.
I have read this is a bug with 10.4.6. Not sure if 10.4.7 helps.
Perhaps related to also having > 2G of RAM.
These CLOSE_WAIT problems seem to appear with a release of WO (I
forget which) and never went away, though I am still using the WO
version one back from the latest (where WO is not bundled in XCode).
Is the root of this problem likely to be a deadlock in our instances?
I will report back if I can get some stack trace info next time.
Thanks
John
We have had similar problems in the past, but this seems to be
happening more often now:
Web site performance becomes erratic (some instances not responding)
Viewing JavaMonitor one or two instances show 0 in the Transactions
column, even though you know that is not the case (there will have
been hundreds or thousands)
Our WO app instances are all within the port range of 2000 - 2050:
Running:
/usr/sbin/lsof -i tcp:2001-2050 -P
gives some ok LISTEN lines + some problem lines like this:
java 553 root 20u IPv6 0x04bbcd68 0t0 TCP plug:2025-
plug:64477 (CLOSE_WAIT)
java 553 root 21u IPv6 0x0646b2c8 0t0 TCP plug:2025-plug:64483 (CLOSE_WAIT)
When I send a kill -9 to the problem process, 533 these are of coursecleared.
Casting my lsof net a bit wider:
/usr/sbin/lsof -i tcp:2001-2050 -P
gives about 10 lines like this:
java 281 root 9u IPv6 0x04bba470 0t0 TCP [::127.0.0.1]:
49228->[::127.0.0.1]:3306 (CLOSE_WAIT)
java 569 root 9u IPv6 0x0616dd70 0t0 TCP [::127.0.0.1]:
49247->[::127.0.0.1]:3306 (CLOSE_WAIT)
These all seem to be wotaskd processes. Are there meant to be 10
wotaskd processes running?
I tried sending a kill -QUIT signal to one of my hung WO instances to
force a stack trace, but nothing came out in the log file. I have now
read the posting about editing SpawnOfWotaskd.sh to get the log
output, so will do this for next time.
As an aside, we try to use lsof every night in a script to detect
these problems and reboot, but lsof sometimes returns absolutely
nothing or "unable to read process table" or something close to that.
I have read this is a bug with 10.4.6. Not sure if 10.4.7 helps.
Perhaps related to also having > 2G of RAM.
These CLOSE_WAIT problems seem to appear with a release of WO (I
forget which) and never went away, though I am still using the WO
version one back from the latest (where WO is not bundled in XCode).
Is the root of this problem likely to be a deadlock in our instances?
I will report back if I can get some stack trace info next time.
Thanks
John