Avatar of oldcar53
oldcar53
Flag for United States of America asked on

Damage from UPS failure?

The UPS associated with our server (at our hosting provider) developed a problem and failed, during an electrical storm. We are running CentOS 5.x.
Subsequently, after cominig back online, our server began crashing randomly but not infrequently. There seemed to be a problem with Interrupt 169 which pertains to the raid controller. The controller was replaced, but another crash occurred. The kernel was then rebooted with irqpoll. There was one additional 'soft lockup', but no crash, since that point.

Question:
Could a UPS failure cause this sort of problem?

(I'm new at this posting-questions thing, so have probably left a lot out.)

Roger Ide
LinuxServer Hardware

Avatar of undefined
Last Comment
oldcar53

8/22/2022 - Mon
multimac

Hello Roger,

do you have screenshots or log files  of your kernel crash?

Have you already forced a check of the filesystems?
ASKER CERTIFIED SOLUTION
pclinuxguru

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
oldcar53

ASKER
Hi-
There was no check of the filesystems, and the last event was Monday at 3 am.
The nature of the problem prevented any log-writing. I was able to run dmesg on one crash, as I caught it on the way down:

irq 169: nobody cared (try booting with the "irqpoll" option)
 [<c044ea52>] __report_bad_irq+0x2b/0x69
 [<c044ec49>] note_interrupt+0x1b9/0x1f0
 [<c044e215>] handle_IRQ_event+0x45/0x8c
 [<c044e339>] __do_IRQ+0xdd/0x118
 [<c044e25c>] __do_IRQ+0x0/0x118
 [<c04074c4>] do_IRQ+0x9b/0xc3
 [<c040597a>] common_interrupt+0x1a/0x20
 [<c05339f3>] acpi_processor_idle_simple+0x174/0x297
 [<c040597a>] common_interrupt+0x1a/0x20
 [<c053387f>] acpi_processor_idle_simple+0x0/0x297
 [<c0403d14>] cpu_idle+0x9f/0xb9
 =======================
handlers:
[<c058e26d>] (usb_hcd_irq+0x0/0x50)
[<f88db346>] (aac_rx_intr_message+0x0/0x55 [aacraid])
Disabling IRQ #169
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter reset request. SCSI hang ?
INFO: task kjournald:490 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D 00007CBB  2788   490     19           515   469 (L-TLB)
       dff96ed4 00000046 701fe859 00007cbb 00000005 00000000 1834632e 0000000a
       dff69550 701ff33a 00007cbb 00000ae1 00000002 dff6965c c37f0788 c39ac040
       105d6eda dff4a4c4 c37f1128 c37f75cc 00000020 00000001 dff4a4bc 105d6eda
Call Trace:
 [<c0621468>] io_schedule+0x36/0x59
 [<c04790db>] sync_buffer+0x30/0x33
 [<c062163f>] __wait_on_bit+0x33/0x58
 [<c04790ab>] sync_buffer+0x0/0x33
 [<c04790ab>] sync_buffer+0x0/0x33
 [<c06216c6>] out_of_line_wait_on_bit+0x62/0x6a
 [<c043737c>] wake_bit_function+0x0/0x3c
 [<c0479058>] __wait_on_buffer+0x1c/0x1f
 [<f88684b3>] journal_commit_transaction+0x4cf/0xf3c [jbd]
 [<c042e621>] lock_timer_base+0x15/0x2f
 [<c042e6a0>] try_to_del_timer_sync+0x65/0x6c
 [<f886bd08>] kjournald+0xa1/0x1c2 [jbd]
 [<c043734f>] autoremove_wake_function+0x0/0x2d
 [<f886bc67>] kjournald+0x0/0x1c2 [jbd]
 [<c043728a>] kthread+0xc0/0xee
 [<c04371ca>] kthread+0x0/0xee
 [<c0405c87>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task syslogd:2386 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syslogd       D 00007CBA  2340  2386      1          2389  2242 (NOTLB)
       f5c0ced0 00000086 3e0e3044 00007cba 00000070 00000080 030a9588 00000007
       f5c03550 3e0e3868 00007cba 00000824 00000001 f5c0365c c37e9944 f62ea200
       e02e8e68 c37ea2e4 00000001 f5c0cecc c041f0c8 00000000 00000000 ffffffff
Call Trace:
 [<c041f0c8>] __wake_up+0x2a/0x3d
 [<f886b2c1>] log_wait_commit+0x80/0xc7 [jbd]
 [<c043734f>] autoremove_wake_function+0x0/0x2d
 [<f8866679>] journal_stop+0x196/0x1bb [jbd]
 [<c0495846>] __writeback_single_inode+0x199/0x2a5
 [<c045d334>] do_writepages+0x2b/0x32
 [<c0458e37>] __filemap_fdatawrite_range+0x66/0x72
 [<c0495ee4>] sync_inode+0x19/0x24
 [<f889e019>] ext3_sync_file+0xb1/0xdc [ext3]
 [<c0478c15>] do_fsync+0x41/0x83
 [<c0478c74>] __do_fsync+0x1d/0x2b
 [<c0404f4b>] syscall_call+0x7/0xb
 =======================
INFO: task miva:2927 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
miva          D 00007CBA  2572  2927   2757                     (NOTLB)
       e3d1cf44 00000082 a989794c 00007cba f88ad1e0 e3d1cefc 00000000 00000001
       f20be000 aa52abe7 00007cba 00c9329b 00000003 f20be10c c37f75cc f60e2040
       c044e25c c37f7f6c e749e380 e3d1cf30 00000000 e3d1c000 c048af22 ffffffff
Call Trace:
 [<c044e25c>] __do_IRQ+0x0/0x118
 [<c048af22>] locks_remove_posix+0x7d/0x97
 [<c062183f>] __mutex_lock_slowpath+0x4d/0x7c
 [<c062187d>] .text.lock.mutex+0xf/0x14
 [<c0476edc>] generic_file_llseek+0x2a/0xd2
 [<c0476eb2>] generic_file_llseek+0x0/0xd2
 [<c04761f5>] vfs_llseek+0x30/0x34
 [<c0477077>] sys_lseek+0x38/0x63
 [<c0404f4b>] syscall_call+0x7/0xb
 =======================
INFO: task miva:2928 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
miva          D 00007CBA  2524  2928   2895                     (NOTLB)
       d9ee9b2c 00000086 98dd3236 00007cba c1351cc0 00000000 dcdc0990 00000008
       f63d1aa0 99c34660 00007cba 00e6142a 00000000 f63d1bac c37e2b00 f61463c0
       00001000 c37e34a0 f65043e0 00000bef 105d52ba c042d7c7 e030160c ffffffff
Call Trace:
 [<c042d7c7>] getnstimeofday+0x30/0xb6
 [<c0621468>] io_schedule+0x36/0x59
 [<c04790ab>] sync_buffer+0x0/0x33
 [<c04790db>] sync_buffer+0x30/0x33
 [<c062157a>] __wait_on_bit_lock+0x2a/0x52
 [<c04790ab>] sync_buffer+0x0/0x33
 [<c0621604>] out_of_line_wait_on_bit_lock+0x62/0x6a
 [<c043737c>] wake_bit_function+0x0/0x3c
 [<c0479205>] __lock_buffer+0x21/0x24
 [<f88666eb>] do_get_write_access+0x4d/0x462 [jbd]
 [<f8866b18>] journal_get_write_access+0x18/0x26 [jbd]
 [<f88a01f3>] ext3_get_blocks_handle+0x688/0x8d3 [ext3]
 [<f88a0711>] ext3_get_block+0xa2/0xd6 [ext3]
 [<c0479436>] __block_prepare_write+0x19b/0x37e
 [<c045c636>] get_page_from_freelist+0x96/0x378
 [<c04796c4>] block_write_begin+0x88/0xe6
 [<f88a066f>] ext3_get_block+0x0/0xd6 [ext3]
 [<f88a1ad8>] ext3_write_begin+0xc2/0x1a0 [ext3]
 [<f88a066f>] ext3_get_block+0x0/0xd6 [ext3]
 [<c04595af>] generic_file_buffered_write+0x101/0x58b
 [<c042a626>] current_fs_time+0x4a/0x54
 [<c0459edf>] __generic_file_aio_write_nolock+0x4a6/0x52a
 [<c0459431>] __generic_file_aio_read+0x16a/0x1a3
 [<c0457ef3>] file_read_actor+0x0/0xd5
 [<c0459fbc>] generic_file_aio_write+0x59/0xac
 [<f889dea1>] ext3_file_write+0x19/0x83 [ext3]
 [<c0476312>] do_sync_write+0xb6/0xf1
 [<c043734f>] autoremove_wake_function+0x0/0x2d
 [<c044ae8f>] audit_syscall_entry+0x193/0x1bd
 [<c0476f78>] generic_file_llseek+0xc6/0xd2
 [<c047625c>] do_sync_write+0x0/0xf1
 [<c0476b9b>] vfs_write+0xa1/0x143
 [<c04771c5>] sys_write+0x3c/0x63
 [<c0404f4b>] syscall_call+0x7/0xb
 =======================
aacraid: SCSI bus appears hung
INFO: task pdflush:235 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pdflush       D 00007CCA  2664   235     19           236   234 (L-TLB)
       dff3ff34 00000046 e00b2607 00007cca 00000000 00000100 00000000 0000000a
       dffa4550 e00b3239 00007cca 00000c32 00000003 dffa465c c37f75cc c39ac200
       00000000 c37f7f6c 00000000 dffa4550 c38eec50 c37f44cc c39ac200 ffffffff
Call Trace:
 [<c062183f>] __mutex_lock_slowpath+0x4d/0x7c
 [<c062187d>] .text.lock.mutex+0xf/0x14
 [<c0439d75>] down_read+0x8/0x11
 [<c047cc52>] sync_supers+0x47/0xb8
 [<c045d7c1>] wb_kupdate+0x36/0x130
 [<c045dc77>] pdflush+0x0/0x1a1
 [<c045dd82>] pdflush+0x10b/0x1a1
 [<c045d78b>] wb_kupdate+0x0/0x130
 [<c043728a>] kthread+0xc0/0xee
 [<c04371ca>] kthread+0x0/0xee
 [<c0405c87>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task kjournald:490 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D 00007CBB  2788   490     19           515   469 (L-TLB)
       dff96ed4 00000046 701fe859 00007cbb 00000005 00000000 1834632e 0000000a
       dff69550 701ff33a 00007cbb 00000ae1 00000002 dff6965c c37f0788 c39ac040
       105d6eda dff4a4c4 c37f1128 c37f75cc 00000020 00000001 dff4a4bc 105d6eda
Call Trace:
 [<c0621468>] io_schedule+0x36/0x59
 [<c04790db>] sync_buffer+0x30/0x33
 [<c062163f>] __wait_on_bit+0x33/0x58
 [<c04790ab>] sync_buffer+0x0/0x33
 [<c04790ab>] sync_buffer+0x0/0x33
 [<c06216c6>] out_of_line_wait_on_bit+0x62/0x6a
 [<c043737c>] wake_bit_function+0x0/0x3c
 [<c0479058>] __wait_on_buffer+0x1c/0x1f
 [<f88684b3>] journal_commit_transaction+0x4cf/0xf3c [jbd]
 [<c042e621>] lock_timer_base+0x15/0x2f
 [<c042e6a0>] try_to_del_timer_sync+0x65/0x6c
 [<f886bd08>] kjournald+0xa1/0x1c2 [jbd]
 [<c043734f>] autoremove_wake_function+0x0/0x2d
 [<f886bc67>] kjournald+0x0/0x1c2 [jbd]
 [<c043728a>] kthread+0xc0/0xee
 [<c04371ca>] kthread+0x0/0xee
 [<c0405c87>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task syslogd:2386 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syslogd       D 00007CBA  2340  2386      1          2389  2242 (NOTLB)
       f5c0ced0 00000086 3e0e3044 00007cba 00000070 00000080 030a9588 00000007
       f5c03550 3e0e3868 00007cba 00000824 00000001 f5c0365c c37e9944 f62ea200
       e02e8e68 c37ea2e4 00000001 f5c0cecc c041f0c8 00000000 00000000 ffffffff
Call Trace:
 [<c041f0c8>] __wake_up+0x2a/0x3d
 [<f886b2c1>] log_wait_commit+0x80/0xc7 [jbd]
 [<c043734f>] autoremove_wake_function+0x0/0x2d
 [<f8866679>] journal_stop+0x196/0x1bb [jbd]
 [<c0495846>] __writeback_single_inode+0x199/0x2a5
 [<c045d334>] do_writepages+0x2b/0x32
 [<c0458e37>] __filemap_fdatawrite_range+0x66/0x72
 [<c0495ee4>] sync_inode+0x19/0x24
 [<f889e019>] ext3_sync_file+0xb1/0xdc [ext3]
 [<c0478c15>] do_fsync+0x41/0x83
 [<c0478c74>] __do_fsync+0x1d/0x2b
 [<c0404f4b>] syscall_call+0x7/0xb
 =======================
INFO: task hald-addon-stor:2624 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
hald-addon-st D 00007CC8  2744  2624   2606                2615 (NOTLB)
       f6137e98 00000082 5e52e576 00007cc8 c048c272 e02265bc c3944c8c 0000000a
       f634f000 5e538047 00007cc8 00009ad1 00000002 f634f10c c37f0788 c39e5040
       00000800 c37f1128 0028bcfd 00000003 dca86005 dfc0fe40 e0235888 ffffffff
Call Trace:
 [<c048c272>] dput+0x22/0xed
 [<f8879d7b>] scsi_block_when_processing_errors+0x7a/0xbf [scsi_mod]
 [<c043734f>] autoremove_wake_function+0x0/0x2d
 [<f8854dfc>] sd_open+0x69/0x10f [sd_mod]
 [<c047dce0>] do_open+0x1de/0x2ce
 [<c047df3c>] blkdev_open+0x0/0x44
 [<c047df58>] blkdev_open+0x1c/0x44
 [<c0474f91>] __dentry_open+0xc7/0x1ab
 [<c04750d9>] nameidata_to_filp+0x19/0x28
 [<c0475113>] do_filp_open+0x2b/0x31
 [<c0475157>] do_sys_open+0x3e/0xae
 [<c04751f4>] sys_open+0x16/0x18
 [<c0404f4b>] syscall_call+0x7/0xb
 =======================
INFO: task cpanellogd:4172 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cpanellogd    D 00007CCC  2572  4172      1          4234  4137 (NOTLB)
       f7c5bd6c 00000082 fe647b4e 00007ccc 00000000 00000001 f7c5bd34 0000000a
       f6379550 fe65d004 00007ccc 000154b6 00000001 f637965c c37e9944 f6135740
       005e000a c37ea2e4 e02071b0 00000000 105fbc80 c042d7c7 dfd4972c ffffffff
Call Trace:
 [<c042d7c7>] getnstimeofday+0x30/0xb6
 [<c0621468>] io_schedule+0x36/0x59
 [<c04790ab>] sync_buffer+0x0/0x33
 [<c04790db>] sync_buffer+0x30/0x33
 [<c062157a>] __wait_on_bit_lock+0x2a/0x52
 [<c04790ab>] sync_buffer+0x0/0x33
 [<c0621604>] out_of_line_wait_on_bit_lock+0x62/0x6a
 [<c043737c>] wake_bit_function+0x0/0x3c
 [<c0479205>] __lock_buffer+0x21/0x24
 [<f88666eb>] do_get_write_access+0x4d/0x462 [jbd]
 [<f886627c>] __journal_file_buffer+0x116/0x1ed [jbd]
 [<f8866b18>] journal_get_write_access+0x18/0x26 [jbd]
 [<f889e80b>] ext3_new_inode+0x591/0x971 [ext3]
 [<f88acb40>] ext3_permission+0x0/0xa [ext3]
 [<c0482dba>] permission+0xa2/0xb5
 [<c0484dff>] __link_path_walk+0xcd4/0xdc3
 [<f8866ee5>] journal_start+0xae/0xdd [jbd]
 [<f88a4c0a>] ext3_create+0x75/0xdc [ext3]
 [<c04833cc>] vfs_create+0xca/0x131
 [<c0485e3a>] open_namei+0x16a/0x631
 [<c04397a9>] lock_hrtimer_base+0x19/0x35
 [<c0475104>] do_filp_open+0x1c/0x31
 [<c0475157>] do_sys_open+0x3e/0xae
 [<c04751f4>] sys_open+0x16/0x18
 [<c0404f4b>] syscall_call+0x7/0xb
 =======================
INFO: task tailwatchd:21557 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
tailwatchd    D 00007CCA  2396 21557      1  3073    5334 21526 (NOTLB)
       f7968ecc 00200082 c5a70071 00007cca 80000001 00000000 00000001 0000000a
       f6139aa0 c5ace8f9 00007cca 0005e888 00000001 f6139bac c37e9944 c3b19e40
       f7968f3c c37ea2e4 e7bff000 00000310 f7968f3c ffffffe9 f7968f3c ffffffff
Call Trace:
 [<c062183f>] __mutex_lock_slowpath+0x4d/0x7c
 [<c062187d>] .text.lock.mutex+0xf/0x14
 [<c0485dad>] open_namei+0xdd/0x631
 [<c0475104>] do_filp_open+0x1c/0x31
 [<c0475157>] do_sys_open+0x3e/0xae
 [<c04751f4>] sys_open+0x16/0x18
 [<c0404f4b>] syscall_call+0x7/0xb
 =======================
aacraid: aac_fib_send: first asynchronous command timed out.
Usually a result of a PCI interrupt routing problem;
update mother board BIOS or consider utilizing one of
the SAFE mode kernel options (acpi, apic etc)
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
oldcar53

ASKER
This is good. Thank you for adding to my perspective.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes