Solved

Out of Memory Error - Kill PIDs

Posted on 2008-10-16
6
1,946 Views
Last Modified: 2013-12-16
My Architecture is a NFS LINUX Cluster (LINUX Cluster Suite 4) with an active/passive node.
During our backups to the NFS Servers, we are receiving a out of memory error. From the logs below is it easy to determine what is the real cause of the low memory?


Oct 15 07:50:53 ttecprodnfs2 su(pam_unix)[3371]: session closed for user root
Oct 15 07:50:55 ttecprodnfs2 sshd(pam_unix)[3344]: session closed for user browna
Oct 15 08:39:22 ttecprodnfs2 ntpd[18984]: time reset -0.132295 s
Oct 15 08:43:39 ttecprodnfs2 ntpd[18984]: synchronized to LOCAL(0), stratum 10
Oct 15 08:43:40 ttecprodnfs2 ntpd[18984]: synchronized to 10.33.2.35, stratum 11
Oct 15 08:44:43 ttecprodnfs2 ntpd[18984]: synchronized to 64.202.112.75, stratum 1
Oct 15 08:58:45 ttecprodnfs2 ntpd[18984]: time reset +0.168468 s
Oct 15 09:03:02 ttecprodnfs2 ntpd[18984]: synchronized to LOCAL(0), stratum 10
Oct 15 09:04:07 ttecprodnfs2 ntpd[18984]: synchronized to 65.111.164.223, stratum 2
Oct 15 09:04:08 ttecprodnfs2 ntpd[18984]: synchronized to 64.202.112.75, stratum 1
Oct 15 15:39:03 ttecprodnfs2 ntpd[18984]: synchronized to 65.111.164.223, stratum 2
Oct 15 15:40:09 ttecprodnfs2 ntpd[18984]: synchronized to 64.202.112.75, stratum 1
Oct 15 15:40:09 ttecprodnfs2 ntpd[18984]: time reset -0.161100 s
Oct 15 15:44:28 ttecprodnfs2 ntpd[18984]: synchronized to LOCAL(0), stratum 10
Oct 15 15:45:31 ttecprodnfs2 ntpd[18984]: synchronized to 64.202.112.75, stratum 1
Oct 15 16:04:53 ttecprodnfs2 ntpd[18984]: time reset +0.305731 s
Oct 15 16:09:10 ttecprodnfs2 ntpd[18984]: synchronized to LOCAL(0), stratum 10
Oct 15 16:10:08 ttecprodnfs2 ntpd[18984]: synchronized to 10.33.2.35, stratum 11
Oct 15 16:10:15 ttecprodnfs2 ntpd[18984]: synchronized to 65.111.164.223, stratum 2
Oct 15 16:11:11 ttecprodnfs2 ntpd[18984]: synchronized to 64.202.112.75, stratum 1
Oct 15 20:32:06 ttecprodnfs2 kernel: oom-killer: gfp_mask=0xd0
Oct 15 20:32:06 ttecprodnfs2 kernel: Mem-info:
Oct 15 20:32:06 ttecprodnfs2 kernel: DMA per-cpu:
Oct 15 20:32:06 ttecprodnfs2 kernel: cpu 0 hot: low 2, high 6, batch 1
Oct 15 20:32:06 ttecprodnfs2 kernel: cpu 0 cold: low 0, high 2, batch 1
Oct 15 20:32:06 ttecprodnfs2 kernel: cpu 1 hot: low 2, high 6, batch 1
Oct 15 20:32:06 ttecprodnfs2 kernel: cpu 1 cold: low 0, high 2, batch 1
Oct 15 20:32:06 ttecprodnfs2 kernel: cpu 2 hot: low 2, high 6, batch 1
Oct 15 20:32:06 ttecprodnfs2 kernel: cpu 2 cold: low 0, high 2, batch 1
Oct 15 20:32:06 ttecprodnfs2 kernel: cpu 3 hot: low 2, high 6, batch 1
Oct 15 20:32:06 ttecprodnfs2 kernel: cpu 3 cold: low 0, high 2, batch 1
Oct 15 20:32:06 ttecprodnfs2 kernel: Normal per-cpu:
Oct 15 20:32:06 ttecprodnfs2 kernel: cpu 0 hot: low 32, high 96, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 0 cold: low 0, high 32, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 1 hot: low 32, high 96, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 1 cold: low 0, high 32, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 2 hot: low 32, high 96, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 2 cold: low 0, high 32, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 3 hot: low 32, high 96, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 3 cold: low 0, high 32, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: HighMem per-cpu:
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 0 hot: low 32, high 96, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 0 cold: low 0, high 32, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 1 hot: low 32, high 96, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 1 cold: low 0, high 32, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 2 hot: low 32, high 96, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 2 cold: low 0, high 32, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 3 hot: low 32, high 96, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel: cpu 3 cold: low 0, high 32, batch 16
Oct 15 20:32:07 ttecprodnfs2 kernel:
Oct 15 20:32:07 ttecprodnfs2 kernel: Free pages:       14812kB (1600kB HighMem)
Oct 15 20:32:07 ttecprodnfs2 kernel: Active:22725 inactive:1351664 dirty:94129 writeback:329 unstable:0 free:3703 slab:171895 mapped
:21877 pagetables:614
Oct 15 20:32:07 ttecprodnfs2 kernel: DMA free:12556kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scan
ned:3593200 all_unreclaimable? yes
Oct 15 20:32:08 ttecprodnfs2 kernel: protections[]: 0 0 0
Oct 15 20:32:08 ttecprodnfs2 kernel: Normal free:656kB min:928kB low:1856kB high:2784kB active:1188kB inactive:135532kB present:9011
20kB pages_scanned:246543 all_unreclaimable? yes
Oct 15 20:32:08 ttecprodnfs2 kernel: protections[]: 0 0 0
Oct 15 20:32:08 ttecprodnfs2 kernel: HighMem free:1600kB min:512kB low:1024kB high:1536kB active:89888kB inactive:5270820kB present:
6160384kB pages_scanned:0 all_unreclaimable? no
Oct 15 20:32:08 ttecprodnfs2 kernel: protections[]: 0 0 0
Oct 15 20:32:08 ttecprodnfs2 kernel: DMA: 3*4kB 4*8kB 4*16kB 3*32kB 3*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 1255
6kB
Oct 15 20:32:08 ttecprodnfs2 kernel: Normal: 0*4kB 36*8kB 21*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
 656kB
Oct 15 20:32:08 ttecprodnfs2 kernel: HighMem: 6*4kB 9*8kB 6*16kB 6*32kB 11*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB =
 1600kB
Oct 15 20:32:08 ttecprodnfs2 kernel: Swap cache: add 24201, delete 23569, find 9398/11696, race 0+0
Oct 15 20:32:08 ttecprodnfs2 kernel: 0 bounce buffer pages
Oct 15 20:32:08 ttecprodnfs2 kernel: Free swap:       4186800kB
Oct 15 20:32:08 ttecprodnfs2 kernel: 1769472 pages of RAM
Oct 15 20:32:08 ttecprodnfs2 kernel: 1343420 pages of HIGHMEM
Oct 15 20:32:08 ttecprodnfs2 kernel: 211992 reserved pages
Oct 15 20:32:08 ttecprodnfs2 kernel: 1359023 pages shared
Oct 15 20:32:08 ttecprodnfs2 kernel: 633 pages swap cached
Oct 15 20:32:08 ttecprodnfs2 kernel: Out of Memory: Killed process 25436 (klzagent).
Oct 15 23:28:42 ttecprodnfs2 kernel: oom-killer: gfp_mask=0xd0
Oct 15 23:28:42 ttecprodnfs2 kernel: Mem-info:
Oct 15 23:28:42 ttecprodnfs2 kernel: DMA per-cpu:
Oct 15 23:28:42 ttecprodnfs2 kernel: cpu 0 hot: low 2, high 6, batch 1
Oct 15 23:28:42 ttecprodnfs2 kernel: cpu 0 cold: low 0, high 2, batch 1
Oct 15 23:28:42 ttecprodnfs2 kernel: cpu 1 hot: low 2, high 6, batch 1
Oct 15 23:28:42 ttecprodnfs2 kernel: cpu 1 cold: low 0, high 2, batch 1
Oct 15 23:28:42 ttecprodnfs2 kernel: cpu 2 hot: low 2, high 6, batch 1
Oct 15 23:28:42 ttecprodnfs2 kernel: cpu 2 cold: low 0, high 2, batch 1
Oct 15 23:28:42 ttecprodnfs2 kernel: cpu 3 hot: low 2, high 6, batch 1
Oct 15 23:28:42 ttecprodnfs2 kernel: cpu 3 cold: low 0, high 2, batch 1
Oct 15 23:28:42 ttecprodnfs2 kernel: Normal per-cpu:
Oct 15 23:28:42 ttecprodnfs2 kernel: cpu 0 hot: low 32, high 96, batch 16
Oct 15 23:28:43 ttecprodnfs2 clurgmgrd[31568]: <notice> Shutting down
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 0 cold: low 0, high 32, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 1 hot: low 32, high 96, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 1 cold: low 0, high 32, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 2 hot: low 32, high 96, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 2 cold: low 0, high 32, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 3 hot: low 32, high 96, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 3 cold: low 0, high 32, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: HighMem per-cpu:
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 0 hot: low 32, high 96, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 0 cold: low 0, high 32, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 1 hot: low 32, high 96, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 1 cold: low 0, high 32, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 2 hot: low 32, high 96, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 2 cold: low 0, high 32, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 3 hot: low 32, high 96, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel: cpu 3 cold: low 0, high 32, batch 16
Oct 15 23:28:43 ttecprodnfs2 kernel:
Oct 15 23:28:43 ttecprodnfs2 kernel: Free pages:       16340kB (3136kB HighMem)
Oct 15 23:28:43 ttecprodnfs2 kernel: Active:19425 inactive:1353469 dirty:105492 writeback:0 unstable:0 free:4085 slab:173041 mapped:



19387 pagetables:579
Oct 15 23:28:43 ttecprodnfs2 kernel: DMA free:12556kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scan
ned:3603829 all_unreclaimable? yes
Oct 15 23:28:43 ttecprodnfs2 kernel: protections[]: 0 0 0
Oct 15 23:28:43 ttecprodnfs2 kernel: Normal free:648kB min:928kB low:1856kB high:2784kB active:2328kB inactive:130068kB present:9011
20kB pages_scanned:250272 all_unreclaimable? yes
Oct 15 23:28:43 ttecprodnfs2 kernel: protections[]: 0 0 0
Oct 15 23:28:43 ttecprodnfs2 kernel: HighMem free:3136kB min:512kB low:1024kB high:1536kB active:75372kB inactive:5283808kB present:
6160384kB pages_scanned:0 all_unreclaimable? no
Oct 15 23:28:43 ttecprodnfs2 clurgmgrd[31568]: <notice> Stopping service NFS
Oct 15 23:28:43 ttecprodnfs2 kernel: protections[]: 0 0 0
Oct 15 23:28:43 ttecprodnfs2 kernel: DMA: 3*4kB 4*8kB 4*16kB 3*32kB 3*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 1255
6kB
Oct 15 23:28:43 ttecprodnfs2 kernel: Normal: 0*4kB 1*8kB 34*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
648kB
Oct 15 23:28:43 ttecprodnfs2 kernel: HighMem: 34*4kB 21*8kB 15*16kB 33*32kB 16*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096
kB = 3136kB
Oct 15 23:28:43 ttecprodnfs2 kernel: Swap cache: add 24327, delete 23701, find 9531/11840, race 0+0
Oct 15 23:28:43 ttecprodnfs2 kernel: 0 bounce buffer pages
Oct 15 23:28:43 ttecprodnfs2 kernel: Free swap:       4186820kB
Oct 15 23:28:43 ttecprodnfs2 kernel: 1769472 pages of RAM
Oct 15 23:28:43 ttecprodnfs2 kernel: 1343420 pages of HIGHMEM
Oct 15 23:28:43 ttecprodnfs2 kernel: 211992 reserved pages
Oct 15 23:28:44 ttecprodnfs2 kernel: 1358536 pages shared
Oct 15 23:28:44 ttecprodnfs2 kernel: 627 pages swap cached
Oct 15 23:28:44 ttecprodnfs2 kernel: Out of Memory: Killed process 31568 (clurgmgrd).
Oct 15 23:28:44 ttecprodnfs2 clurgmgrd: [31568]: <info> Removing IPv4 address 10.33.2.37 from bond0.102
Oct 15 23:28:54 ttecprodnfs2 clurgmgrd: [31568]: <info> Removing export: *:/T24
Oct 15 23:28:55 ttecprodnfs2 last message repeated 10 times
Oct 15 23:28:55 ttecprodnfs2 clurgmgrd: [31568]: <info> unmounting /T24
Oct 15 23:28:56 ttecprodnfs2 clurgmgrd: [31568]: <notice> Forcefully unmounting /T24
Oct 15 23:28:56 ttecprodnfs2 clurgmgrd: [31568]: <warning> Dropping node-wide NFS locks
Oct 15 23:29:06 ttecprodnfs2 clurgmgrd: [31568]: <info> unmounting /T24
Oct 15 23:29:06 ttecprodnfs2 clurgmgrd: [31568]: <notice> Forcefully unmounting /T24
Oct 15 23:29:07 ttecprodnfs2 clurgmgrd: [31568]: <info> Sending reclaim notifications via ttecprodnfs2
Oct 15 23:29:07 ttecprodnfs2 rpc.statd[24171]: Version 1.0.6 Starting
Oct 15 23:29:07 ttecprodnfs2 rpc.statd[24171]: Flags: No-Daemon Notify-Only
Oct 15 23:29:07 ttecprodnfs2 rpc.statd[24171]: statd running as root. chown /tmp/statd-ttecprodnfs2.23993/sm to choose different use
r
Oct 15 23:29:10 ttecprodnfs2 rpc.statd[24171]: Caught signal 15, un-registering and exiting.
Oct 15 23:29:10 ttecprodnfs2 clurgmgrd: [31568]: <err> 'umount /T24' failed, error=0
Oct 15 23:29:10 ttecprodnfs2 clurgmgrd[31568]: <notice> stop on fs "T24" returned 2 (invalid argument(s))
Oct 15 23:29:10 ttecprodnfs2 clurgmgrd[31568]: <crit> #12: RG NFS failed to stop; intervention required
Oct 15 23:29:10 ttecprodnfs2 clurgmgrd[31568]: <notice> Service NFS is failed
Oct 15 23:29:32 ttecprodnfs2 clurgmgrd[31568]: <notice> Shutdown complete, exiting
Oct 16 00:10:19 ttecprodnfs2 sshd(pam_unix)[24196]: session opened for user thomasr by (uid=0)
Oct 16 00:10:22 ttecprodnfs2 su(pam_unix)[24223]: session opened for user root by thomasr(uid=0)
Oct 16 00:22:37 ttecprodnfs2 clurgmgrd[24316]: <notice> Resource Group Manager Starting
Oct 16 00:22:37 ttecprodnfs2 clurgmgrd[24316]: <info> Loading Service Data
Oct 16 00:22:37 ttecprodnfs2 rgmanager: clurgmgrd startup succeeded
Oct 16 00:22:37 ttecprodnfs2 clurgmgrd[24316]: <info> Initializing Services
Oct 16 00:22:37 ttecprodnfs2 clurgmgrd: [24316]: <info> Removing export: *:/T24
Oct 16 00:22:37 ttecprodnfs2 last message repeated 10 times
Oct 16 00:22:38 ttecprodnfs2 clurgmgrd: [24316]: <info> unmounting /T24
Oct 16 00:22:38 ttecprodnfs2 clurgmgrd: [24316]: <info> Removing export: *:/SIG
Oct 16 00:22:38 ttecprodnfs2 last message repeated 6 times
Oct 16 00:22:38 ttecprodnfs2 clurgmgrd: [24316]: <info> unmounting /SIG
Oct 16 00:22:38 ttecprodnfs2 clurgmgrd: [24316]: <info> Removing export: *:/backup
Oct 16 00:22:38 ttecprodnfs2 clurgmgrd: [24316]: <info> Removing export: *:/backup
Oct 16 00:22:39 ttecprodnfs2 clurgmgrd: [24316]: <info> unmounting /backup
Oct 16 00:22:40 ttecprodnfs2 clurgmgrd: [24316]: <info> Removing export: *:/DataBridge
Oct 16 00:22:40 ttecprodnfs2 last message repeated 5 times
Oct 16 00:22:41 ttecprodnfs2 clurgmgrd: [24316]: <info> unmounting /DataBridge
Oct 16 00:22:41 ttecprodnfs2 clurgmgrd[24316]: <info> Services Initialized
Oct 16 00:22:41 ttecprodnfs2 clurgmgrd[24316]: <info> Logged in SG "usrm::manager"
Oct 16 00:22:41 ttecprodnfs2 clurgmgrd[24316]: <info> Magma Event: Membership Change
Oct 16 00:22:41 ttecprodnfs2 clurgmgrd[24316]: <info> State change: Local UP
Oct 16 00:22:41 ttecprodnfs2 clurgmgrd[24316]: <info> State change: ttecprodnfs1 UP
Oct 16 00:48:08 ttecprodnfs2 shutdown: shutting down for system reboot
Oct 16 00:48:09 ttecprodnfs2 init: Switching to runlevel: 6
Oct 16 00:48:10 ttecprodnfs2 rgmanager: [25401]: <notice> Shutting down Cluster Service Manager...
Oct 16 00:48:10 ttecprodnfs2 clurgmgrd[24316]: <notice> Shutting down
Oct 16 00:48:10 ttecprodnfs2 ccsd[31530]: Unable to write package back to sender: Broken pipe
Oct 16 00:48:10 ttecprodnfs2 ccsd[31530]: Error while processing request: Operation not permitted
Oct 16 00:48:12 ttecprodnfs2 clurgmgrd[24316]: <notice> Shutdown complete, exiting
Oct 16 00:48:12 ttecprodnfs2 rgmanager: [25401]: <notice> Cluster Service Manager is stopped.
Oct 16 00:48:12 ttecprodnfs2 haldaemon: haldaemon -TERM succeeded
Oct 16 00:48:12 ttecprodnfs2 messagebus: messagebus -TERM succeeded
Oct 16 00:48:12 ttecprodnfs2 psacct: Shutting down process accounting:  succeeded
Oct 16 00:48:12 ttecprodnfs2 mountd[4592]: Caught signal 15, un-registering and exiting.
Oct 16 00:48:12 ttecprodnfs2 nfs: rpc.mountd shutdown succeeded
Oct 16 00:48:16 ttecprodnfs2 kernel: lockd: couldn't shutdown host module!
Oct 16 00:48:16 ttecprodnfs2 kernel: nfsd: last server has exited
Oct 16 00:48:16 ttecprodnfs2 kernel: nfsd: unexporting all filesystems
Oct 16 00:48:17 ttecprodnfs2 nfs: nfsd shutdown succeeded
Oct 16 00:48:17 ttecprodnfs2 nfs: rpc.rquotad shutdown succeeded
Oct 16 00:48:17 ttecprodnfs2 nfs: Shutting down NFS services:  succeeded
Oct 16 00:48:17 ttecprodnfs2 sshd: sshd -TERM succeeded
Oct 16 00:48:17 ttecprodnfs2 acpid: acpid shutdown succeeded
Oct 16 00:48:17 ttecprodnfs2 crond: crond shutdown succeeded
Oct 16 00:48:17 ttecprodnfs2 ntpd[18984]: ntpd exiting on signal 15
Oct 16 00:48:17 ttecprodnfs2 ntpd: ntpd shutdown succeeded
Oct 16 00:48:17 ttecprodnfs2 fenced: Stopping fence domain:
Oct 16 00:48:17 ttecprodnfs2 fenced: shutdown succeeded
Oct 16 00:48:17 ttecprodnfs2 fenced:
Oct 16 00:48:17 ttecprodnfs2 fenced:
Oct 16 00:48:17 ttecprodnfs2 rc: Stopping fenced:  succeeded
Oct 16 00:48:17 ttecprodnfs2 lock_gulmd: Stopping lock_gulmd:
Oct 16 00:48:17 ttecprodnfs2 lock_gulmd: shutdown succeeded
Oct 16 00:48:17 ttecprodnfs2 lock_gulmd:
Oct 16 00:48:17 ttecprodnfs2 lock_gulmd:
Oct 16 00:48:17 ttecprodnfs2 rc: Stopping lock_gulmd:  succeeded
Oct 16 00:48:17 ttecprodnfs2 cman: Stopping cman:
Oct 16 00:48:21 ttecprodnfs2 cman: failed to stop cman failed
Oct 16 00:48:21 ttecprodnfs2 cman:
Oct 16 00:48:21 ttecprodnfs2 cman:
Oct 16 00:48:21 ttecprodnfs2 rc: Stopping cman:  failed
Oct 16 00:48:21 ttecprodnfs2 ccsd[31530]: Stopping ccsd, SIGTERM received.
Oct 16 00:48:22 ttecprodnfs2 ccsd: shutdown succeeded
Oct 16 00:48:22 ttecprodnfs2 rpc.statd[3958]: Caught signal 15, un-registering and exiting.
Oct 16 00:48:22 ttecprodnfs2 nfslock: rpc.statd shutdown succeeded
Oct 16 00:48:22 ttecprodnfs2 auditd[3974]: The audit daemon is exiting.
Oct 16 00:48:22 ttecprodnfs2 kernel: audit(1224132502.355:26382): audit_pid=0 old=3974 by auid=4294967295
Oct 16 00:48:22 ttecprodnfs2 auditd: auditd shutdown succeeded
Oct 16 00:48:22 ttecprodnfs2 irqbalance: irqbalance shutdown succeeded
Oct 16 00:48:22 ttecprodnfs2 portmap: portmap shutdown succeeded
Oct 16 00:48:22 ttecprodnfs2 kernel: Kernel logging (proc) stopped.
Oct 16 00:48:22 ttecprodnfs2 kernel: Kernel log daemon terminating.
Oct 16 00:48:24 ttecprodnfs2 syslog: klogd shutdown succeeded
Oct 16 00:48:24 ttecprodnfs2 exiting on signal 15
Oct 16 00:50:54 ttecprodnfs2 syslogd 1.4.1: restart.
Oct 16 00:50:54 ttecprodnfs2 syslog: syslogd startup succeeded
Oct 16 00:50:54 ttecprodnfs2 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Oct 16 00:50:54 ttecprodnfs2 kernel: Linux version 2.6.9-42.ELsmp (bhcompile@hs20-bc1-1.build.redhat.com) (gcc version 3.4.6 2006040
4 (Red Hat 3.4.6-2)) #1 SMP Wed Jul 12 23:27:17 EDT 2006
Oct 16 00:50:54 ttecprodnfs2 kernel: BIOS-provided physical RAM map:
Oct 16 00:50:54 ttecprodnfs2 kernel:  BIOS-e820: 0000000000000000 - 000000000009d000 (usable)
Oct 16 00:50:54 ttecprodnfs2 kernel:  BIOS-e820: 000000000009d000 - 00000000000a0000 (reserved)
Oct 16 00:50:54 ttecprodnfs2 kernel:  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
Oct 16 00:50:54 ttecprodnfs2 kernel:  BIOS-e820: 0000000000100000 - 00000000cffbce80 (usable)
Oct 16 00:50:54 ttecprodnfs2 kernel:  BIOS-e820: 00000000cffbce80 - 00000000cffd0000 (ACPI data)
Oct 16 00:50:54 ttecprodnfs2 kernel:  BIOS-e820: 00000000cffd0000 - 00000000d0000000 (reserved)
Oct 16 00:50:54 ttecprodnfs2 kernel:  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
Oct 16 00:50:54 ttecprodnfs2 kernel:  BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
Oct 16 00:50:54 ttecprodnfs2 kernel:  BIOS-e820: 0000000100000000 - 00000001b0000000 (usable)
Oct 16 00:50:54 ttecprodnfs2 syslog: klogd startup succeeded
Oct 16 00:50:54 ttecprodnfs2 kernel: 6016MB HIGHMEM available.
Oct 16 00:50:54 ttecprodnfs2 kernel: 896MB LOWMEM available.
Oct 16 00:50:54 ttecprodnfs2 kernel: found SMP MP-table at 0009d140
Oct 16 00:50:55 ttecprodnfs2 irqbalance: irqbalance startup succeeded
Oct 16 00:50:55 ttecprodnfs2 kernel: Using x86 segment limits to approximate NX protection
Oct 16 00:50:55 ttecprodnfs2 portmap: portmap startup succeeded
Oct 16 00:50:55 ttecprodnfs2 kernel: DMI 2.3 present.
Oct 16 00:50:55 ttecprodnfs2 kernel: ServerWorks chipset detected. Disabling timer routing over 8254.
Oct 16 00:50:55 ttecprodnfs2 kernel: Using APIC driver default
Oct 16 00:50:55 ttecprodnfs2 rpc.statd[3589]: Version 1.0.6 Starting
Oct 16 00:50:55 ttecprodnfs2 nfslock: rpc.statd startup succeeded
Oct 16 00:50:55 ttecprodnfs2 rpc.statd[3589]: statd running as root. chown /var/lib/nfs/statd/sm to choose different user
Oct 16 00:50:55 ttecprodnfs2 kernel: ACPI: PM-Timer IO Port: 0x588
Oct 16 00:50:55 ttecprodnfs2 kernel: ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Oct 16 00:50:55 ttecprodnfs2 auditd[3610]: Init complete, auditd 1.0.14 listening for events
Oct 16 00:50:55 ttecprodnfs2 auditd: auditd startup succeeded
Oct 16 00:50:55 ttecprodnfs2 kernel: Processor #0 6:15 APIC version 20
Oct 16 00:50:55 ttecprodnfs2 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled)
Oct 16 00:50:55 ttecprodnfs2 kernel: Processor #6 6:15 APIC version 20
Oct 16 00:50:55 ttecprodnfs2 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Oct 16 00:50:55 ttecprodnfs2 kernel: Processor #1 6:15 APIC version 20
Oct 16 00:50:55 ttecprodnfs2 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
Oct 16 00:50:55 ttecprodnfs2 kernel: Processor #7 6:15 APIC version 20
Oct 16 00:50:55 ttecprodnfs2 kernel: ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
Oct 16 00:50:55 ttecprodnfs2 kernel: ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
Oct 16 00:50:55 ttecprodnfs2 kernel: ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
Oct 16 00:50:55 ttecprodnfs2 kernel: ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
Oct 16 00:50:56 ttecprodnfs2 kernel: Enabling APIC mode:  Flat.  Using 0 I/O APICs
Oct 16 00:50:56 ttecprodnfs2 kernel: ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0])
Oct 16 00:50:56 ttecprodnfs2 kernel: IOAPIC[0]: apic_id 14, version 32, address 0xfec00000, GSI 0-23
Oct 16 00:50:56 ttecprodnfs2 kernel: ACPI: IOAPIC (id[0x0d] address[0xfec80000] gsi_base[24])
Oct 16 00:50:56 ttecprodnfs2 kernel: IOAPIC[1]: apic_id 13, version 32, address 0xfec80000, GSI 24-47
Oct 16 00:50:56 ttecprodnfs2 rpcidmapd: rpc.idmapd startup succeeded
Oct 16 00:50:56 ttecprodnfs2 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
Oct 16 00:50:56 ttecprodnfs2 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Oct 16 00:50:56 ttecprodnfs2 kernel: Using ACPI (MADT) for SMP configuration information
Oct 16 00:50:56 ttecprodnfs2 kernel: Allocating PCI resources starting at d1000000 (gap: d0000000:10000000)
Oct 16 00:50:56 ttecprodnfs2 kernel: Built 1 zonelists
Oct 16 00:50:56 ttecprodnfs2 kernel: Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet
Oct 16 00:50:56 ttecprodnfs2 kernel: Initializing CPU#0
Oct 16 00:50:56 ttecprodnfs2 kernel: CPU 0 irqstacks, hard=c03ee000 soft=c03ce000
Oct 16 00:50:56 ttecprodnfs2 kernel: PID hash table entries: 4096 (order: 12, 65536 bytes)
Oct 16 00:50:56 ttecprodnfs2 kernel: Detected 2000.969 MHz processor.
Oct 16 00:50:56 ttecprodnfs2 kernel: Using pmtmr for high-res timesource
Oct 16 00:50:56 ttecprodnfs2 kernel: Console: colour VGA+ 80x25
Oct 16 00:50:56 ttecprodnfs2 kernel: Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Oct 16 00:50:56 ttecprodnfs2 kernel: Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Oct 16 00:50:56 ttecprodnfs2 ccsd[3681]: Starting ccsd 1.0.7:
Oct 16 00:50:57 ttecprodnfs2 kernel: Memory: 6227796k/7077888k available (1876k kernel code, 62440k reserved, 759k data, 184k init,
5373680k highmem)
Oct 16 00:50:57 ttecprodnfs2 ccsd[3681]:  Built: Jun 22 2006 18:15:41
Oct 16 00:50:57 ttecprodnfs2 kernel: Calibrating delay using timer specific routine.. 4003.08 BogoMIPS (lpj=2001542)
Oct 16 00:50:57 ttecprodnfs2 ccsd[3681]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Oct 16 00:50:57 ttecprodnfs2 kernel: Security Scaffold v1.0.0 initialized
Oct 16 00:50:57 ttecprodnfs2 kernel: SELinux:  Initializing.
Oct 16 00:50:57 ttecprodnfs2 kernel: SELinux:  Starting in permissive mode
Oct 15 20:50:43 ttecprodnfs2 rc.sysinit: -e
Oct 16 00:50:57 ttecprodnfs2 kernel: There is already a security framework initialized, register_security failed.
Oct 16 00:50:57 ttecprodnfs2 ccsd:  succeede
0
Comment
Question by:rbtt
  • 4
6 Comments
 
LVL 4

Accepted Solution

by:
larsga earned 500 total points
ID: 22734440
Hard to tell what the true cause is from that log only.

For general information on the oom-killer and tracking down the cause of an oom, see http://linux-mm.org/OOM_Killer and http://linux-mm.org/OOM

Take a look at /proc/meminfo and /proc/slabinfo during normal operation and under heavy load.

If it turns out that you are unable to trace down the real cause, you can mitigate this problem somewhat by marking important services as protected from oom (/proc/<pid>/oom_adj).
0
 
LVL 4

Assisted Solution

by:larsga
larsga earned 500 total points
ID: 22734462
Oh, wait. The ability to protect programs from the oom-killer was added somewhere around kernel 2.6.11. From your log above, you are running 2.6.9 so that might not be an option for you.
0
 

Author Comment

by:rbtt
ID: 22739865
Is there a known problen with RHEL 32bit as far as memory managment is concerned?
0
 
LVL 4

Expert Comment

by:larsga
ID: 22742275
From what information is in the log and what I can find in RH's support database, the following might apply to your situation:

http://kbase.redhat.com/faq/FAQ_85_8968.shtm
http://kbase.redhat.com/faq/FAQ_43_8555.shtm
http://kbase.redhat.com/faq/FAQ_85_10725.shtm

The workarounds/fixes mentioned in those articles are (1) tuning a kernel variable (set /proc/sys/vm/lower_zone_protection to 100) to make reclamation of lowmem pages more agressive and (2) changing to the hugemem kernel.

In general, running a 32bit kernel on a machine with 3GB+ of RAM can cause problems because the 0-4GB physical address space gets full (it is also used by PCI memory-mapped IO / DMA etc) and some of the RAM has to be mapped to above the 4GB limit and accessed through PAE. This causes a split in memory (some things like DMA must use RAM in the "lowmem" 0-4GB range and can not use memory in the "highmem" 4GB+ range). Thus you can get into a situation where you have lots of "highmem" available, but run out of "lowmem" - it is analogous to the "DOS memory"/extended-/expanded-memory issues one had in the MS-DOS days.

This 32bit/3GB+ issue is not limited to RHEL or to Linux. It is a PC architecture limit. 32bit versions of Windows also have the same problem.
0
 
LVL 4

Expert Comment

by:larsga
ID: 22746440
I've looked into this a bit more. I feel pretty certain that you run out of lowmem. Changing to the 'hugemem' kernel should fix the issue, albeit at the cost of a little cpu overhead. Beats the alternative of being hit with ooms, though.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Setting up Secure Ubuntu server on VMware 1.      Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions. 2.      Power on th…
If you use Debian 6 Squeeze and you are tired of looking at the childish graphical GDM login screen that is used by default, here's an easy way to change it. If you've already tried to change it you've probably discovered that none of the old met…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now