We have noticed a message HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 84 seconds.
This happens everyday night between 11 and 2. The monitorig application gives heavy memory usage on the servers and I guess this could be the cause of the Problem. The messsage is logged in the WebSphere Application Server Cluster members running on Version 6.0.2.31 and JDk is 1.4.2 SR11.
Below are the log details from the SystemOut
[1/30/09 23:27:46:888 AST] 0000167c CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 6 seconds.
[1/31/09 0:52:22:349 AST] 000000e9 WorkQueueMana W TCPC0005W: TCP Channel SIB_TCP_JFAP could not obtain thread from thread pool <null>.
[1/31/09 0:52:39:788 AST] 000036b6 WorkQueueMana W TCPC0005W: TCP Channel SIB_TCP_JFAP could not obtain thread from thread pool <null>.
[1/31/09 0:58:04:781 AST] 0000512e SystemOut O DefaultValidationEventHand
ler: [ERROR]: Unexpected element {}:DOCUMENT_NUMBER
Location:
[1/31/09 0:59:55:649 AST] 00005225 SibMessage E [:] CWSIC2023E: A communication error occurred when sending to or receiving from a remote client: exception com.ibm.wsspi.sib.core.exc
eption.SIC
onnectionD
roppedExce
ption: CWSIJ0044E: An operation was attempted on a connection that is already closed..
[1/31/09 0:59:55:656 AST] 00005234 SibMessage E [:] CWSIC2023E: A communication error occurred when sending to or receiving from a remote client: exception com.ibm.wsspi.sib.core.exc
eption.SIC
onnectionD
roppedExce
ption: CWSIJ0044E: An operation was attempted on a connection that is already closed..
[1/31/09 0:59:55:830 AST] 00005233 SibMessage E [:] CWSIC2023E: A communication error occurred when sending to or receiving from a remote client: exception com.ibm.wsspi.sib.core.exc
eption.SIC
onnectionD
roppedExce
ption: CWSIJ0044E: An operation was attempted on a connection that is already closed..
[1/31/09 0:59:55:829 AST] 00005227 SibMessage E [:] CWSIC2021E: A communication error occurred when sending to or receiving from a remote client: exception com.ibm.wsspi.sib.core.exc
eption.SIC
onnectionD
roppedExce
ption: CWSIJ0044E: An operation was attempted on a connection that is already closed..
[1/31/09 1:02:40:017 AST] 000036b6 WorkQueueMana W TCPC0005W: TCP Channel SIB_TCP_JFAP could not obtain thread from thread pool <null>.
[1/31/09 1:02:40:164 AST] 000000e9 WorkQueueMana W TCPC0005W: TCP Channel SIB_TCP_JFAP could not obtain thread from thread pool <null>.
[1/31/09 1:02:42:818 AST] 0000368a ThreadMonitor W WSVR0605W: Thread "HAManager.thread.pool : 1" (0000002b) has been active for 692865 milliseconds and may be hung. There is/are 2 thread(s) in total in the server that may be hung.
[1/31/09 1:02:44:614 AST] 0000368a ThreadMonitor W WSVR0605W: Thread "WebContainer : 6957" (00005115) has been active for 723163 milliseconds and may be hung. There is/are 3 thread(s) in total in the server that may be hung.
[1/31/09 1:02:44:933 AST] 0000368a ThreadMonitor W WSVR0605W: Thread "WebContainer : 6978" (0000519e) has been active for 736477 milliseconds and may be hung. There is/are 4 thread(s) in total in the server that may be hung.
[1/31/09 1:04:26:546 AST] 0000167c CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 84 seconds.
[1/31/09 1:04:26:834 AST] 00005265 ApplicationMo W DCSV0004W: DCS Stack DefaultCoreGroup.Cluster_ _Prod_1 at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Did not receive adequate CPU time slice. Last known CPU usage time at 01:01:23:231 AST. Inactivity duration was 183 seconds.
[1/31/09 1:04:30:386 AST] 0000001f MbuRmmAdapter W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_PG_PROD_1\nod
eagent. DCS logical channel is View|Gossip.
[1/31/09 1:04:33:818 AST] 0000001f MbuRmmAdapter W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ _PROD_2\nodeagent. DCS logical channel is View|Gossip.
[1/31/09 1:04:35:285 AST] 00005268 ApplicationMo W DCSV0004W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Did not receive adequate CPU time slice. Last known CPU usage time at 01:01:35:047 AST. Inactivity duration was 180 seconds.
[1/31/09 1:04:36:318 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_PG_PROD_1\AS_
PG_PROD_1.
DCS logical channel is View|Ptp.
[1/31/09 1:04:37:188 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ DM_Prod_1\dmgr. DCS logical channel is View|Ptp.
[1/31/09 1:04:37:383 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ST_Prod_1\AS_
ST_Prod_1.
DCS logical channel is View|Ptp.
[1/31/09 1:04:37:779 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ST_Prod_1\nod
eagent. DCS logical channel is View|Ptp.
[1/31/09 1:04:38:171 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_PG_PROD_2\nod
eagent. DCS logical channel is View|Ptp.
[1/31/09 1:04:38:342 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ _PROD_1\nodeagent. DCS logical channel is View|Ptp.
[1/31/09 1:04:38:720 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ST_Prod_2\AS_
ST_Prod_2.
DCS logical channel is View|Ptp.
[1/31/09 1:04:39:617 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_PG_PROD_2\AS_
PG_PROD_2.
DCS logical channel is View|Ptp.
[1/31/09 1:04:39:757 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ST_Prod_2\nod
eagent. DCS logical channel is View|Ptp.
[1/31/09 1:04:39:983 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ _PROD_2\AS_ _PROD_2. DCS logical channel is View|Ptp.
[1/31/09 1:04:40:126 AST] 0000001f DiscoveryRmmP W DCSV1111W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ _PROD_2\AS_ _PROD_2. DCS logical channel is Connected|Ptp.
[1/31/09 1:04:40:931 AST] 0000001f RmmPtpGroup W DCSV1111W: DCS Stack DefaultCoreGroup.Cluster_ _Prod_1 at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ _PROD_2\AS_ _PROD_2. DCS logical channel is View|Ptp.
[1/31/09 1:04:41:276 AST] 0000001f DiscoveryRmmP W DCSV1111W: DCS Stack DefaultCoreGroup.Cluster_ _Prod_1 at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection from the other member was closed. Suspected members is Cell_ _Prod_1\Node_ _PROD_2\AS_ _PROD_2. DCS logical channel is Connected|Ptp.
[1/31/09 1:04:41:653 AST] 0000001f DiscoveryRmmP W DCSV1113W: DCS Stack DefaultCoreGroup.Cluster_ _Prod_1 at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection to the other member was closed. Suspected member is Cell_ _Prod_1\Node_ _PROD_2\AS_ _PROD_2. DCS logical channel is Connected|Ptp.
[1/31/09 1:04:41:788 AST] 0000001f DiscoveryRmmP W DCSV1113W: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Suspected another member because the outgoing connection to the other member was closed. Suspected member is Cell_ _Prod_1\Node_ _PROD_2\AS_ _PROD_2. DCS logical channel is Connected|Ptp.
[1/31/09 1:04:42:535 AST] 00000017 RoleViewLeade I DCSV8053I: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: View change in process. Excluded members are [Cell_ _Prod_1\Node_ _PROD_1\nodeagent Cell_ _Prod_1\Node_ _PROD_2\AS_ _PROD_2 Cell_ _Prod_1\Node_ _PROD_2\nodeagent Cell_ _Prod_1\Node_PG_PROD_1\AS_
PG_PROD_1 Cell_ _Prod_1\Node_PG_PROD_1\nod
eagent Cell_ _Prod_1\Node_PG_PROD_2\AS_
PG_PROD_2 Cell_ _Prod_1\Node_PG_PROD_2\nod
eagent Cell_ _Prod_1\Node_ DM_Prod_1\dmgr Cell_ _Prod_1\Node_ST_Prod_1\AS_
ST_Prod_1 Cell_ _Prod_1\Node_ST_Prod_1\nod
eagent Cell_ _Prod_1\Node_ST_Prod_2\AS_
ST_Prod_2 Cell_ _Prod_1\Node_ST_Prod_2\nod
eagent].
[1/31/09 1:04:42:537 AST] 0000002a RoleViewLeade I DCSV8053I: DCS Stack DefaultCoreGroup.Cluster_ _Prod_1 at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: View change in process. Excluded members are [Cell_ _Prod_1\Node_ _PROD_2\AS_ _PROD_2].
[1/31/09 1:04:43:309 AST] 0000001f MbuRmmAdapter I DCSV1032I: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Connected a defined member Cell_ _Prod_1\Node_ _PROD_2\AS_ _PROD_2.
[1/31/09 1:04:44:565 AST] 0000001f VSync I DCSV2004I: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: The synchronization procedure completed successfully. The View Identifier is (803:0.Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1). The internal details are [0 0 0 0 0 0 0 0 0 0 0 0 0].
[1/31/09 1:04:44:715 AST] 0000001f VSync I DCSV2004I: DCS Stack DefaultCoreGroup.Cluster_ _Prod_1 at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: The synchronization procedure completed successfully. The View Identifier is (21:0.Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1). The internal details are [0 0].
[1/31/09 1:04:45:007 AST] 00000017 ViewReceiver I DCSV1033I: DCS Stack DefaultCoreGroup at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Confirmed all new view members in view identifier (804:0.Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1). View channel type is View|Ptp.
[1/31/09 1:04:45:226 AST] 0000002a ViewReceiver I DCSV1033I: DCS Stack DefaultCoreGroup.Cluster_ _Prod_1 at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: Confirmed all new view members in view identifier (22:0.Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1). View channel type is View|Ptp.
[1/31/09 1:04:45:770 AST] 0000002a DataStackMemb I DCSV8050I: DCS Stack DefaultCoreGroup.Cluster_ _Prod_1 at Member Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1: New view installed, identifier (22:0.Cell_ _Prod_1\Node_ _PROD_1\AS_ _PROD_1), view size is 1 (AV=1, CD=1, CN=1, DF=2)
[1/31/09 1:04:47:225 AST] 000051de DRSGroup I CWWDR0010E: Replication instance HttpSessionCache caught exception when sending/receiving messages : com.ibm.wsspi.hamanager.da
tastack.Da
taStackMem
bershipCha
ngingExcep
tion: The target member is not currently in view.
at com.ibm.ws.hamanager.datas
tack.DataS
tackImpl.s
endMessage
(DataStack
Impl.java(
Compiled Code))
at com.ibm.ws.hamanager.agent
.AgentClas
sImpl.send
MuxedMessa
ge(AgentCl
assImpl.ja
va(Inlined
Compiled Code))
at com.ibm.ws.hamanager.agent
.AgentImpl
.sendMessa
ge(AgentIm
pl.java(Co
mpiled Code))
at com.ibm.ws.drs.model.DRSGr
oup.send(D
RSGroup.ja
va(Compile
d Code))
at com.ibm.ws.drs.model.DRSGr
oup.send(D
RSGroup.ja
va(Compile
d Code))
at com.ibm.ws.drs.stack.DRSAs
yncSend.pr
ocessSendM
essage(DRS
AsyncSend.
java(Compi
led Code))
at com.ibm.ws.drs.stack.DRSSt
ack.proces
sSendMessa
ge(DRSStac
k.java(Com
piled Code))