virgo0880
asked on
hd5 volume sync problem
My hd5 volume is giving problem saying unable to sync the volume. For this, I unmirrored hdisk1 from rootvg and started the mirror again still gave the same problem. But now the issue is, i think mirrorvg is hanged and it is stuck at hd3 syncing. The mirrorvg seems to be hanged, can i kill the process, remove the hdisk1 from rootvg ? what can be done in this case. When I do "lslv -L hd3", it is showing 16 stale PPs since long time, kindly help in this issue.
Thanks
virgo
Thanks
virgo
ASKER
I killed mirrorvg command , now lsvg -l rootvg shows :
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 1 1 closed/syncd N/A
hd6 paging 128 256 2 open/syncd N/A
impervalv jfs2 1 2 2 open/syncd /opt/imperva
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 10 20 2 open/syncd /
hd2 jfs 45 90 2 open/syncd /usr
hd9var jfs 12 24 2 open/syncd /var
hd3 jfs 32 64 2 open/stale /tmp
hd1 jfs 32 64 2 open/stale /home
hd10opt jfs 10 20 2 open/stale /opt
usrlocallv jfs 10 20 2 open/stale /usr/local
hd7 sysdump 48 48 1 open/syncd N/A
auditlv jfs 1 2 2 open/stale /audit
optioimglv jfs 2 4 2 open/stale /optio_images
So, should I go ahead and remove all the mirror copies of the LVs, i should use following commands right :
rmlvcopy hd5 1 hdisk1
similarly for all LVs...is that right..?
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 1 1 closed/syncd N/A
hd6 paging 128 256 2 open/syncd N/A
impervalv jfs2 1 2 2 open/syncd /opt/imperva
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 10 20 2 open/syncd /
hd2 jfs 45 90 2 open/syncd /usr
hd9var jfs 12 24 2 open/syncd /var
hd3 jfs 32 64 2 open/stale /tmp
hd1 jfs 32 64 2 open/stale /home
hd10opt jfs 10 20 2 open/stale /opt
usrlocallv jfs 10 20 2 open/stale /usr/local
hd7 sysdump 48 48 1 open/syncd N/A
auditlv jfs 1 2 2 open/stale /audit
optioimglv jfs 2 4 2 open/stale /optio_images
So, should I go ahead and remove all the mirror copies of the LVs, i should use following commands right :
rmlvcopy hd5 1 hdisk1
similarly for all LVs...is that right..?
Yes, nearly correct.
Apply rmlvcopy only to mirrored LVs, so don't run it against hd5 and hd7!
Let's hope that all the "good" copies are on hdisk0.
But FIRST, are the two disks in a good state? lsvg -p rootvg ?
Apply rmlvcopy only to mirrored LVs, so don't run it against hd5 and hd7!
Let's hope that all the "good" copies are on hdisk0.
But FIRST, are the two disks in a good state? lsvg -p rootvg ?
ASKER
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 546 262 64..00..00..89..109
hdisk0 active 546 213 15..00..00..89..109
I think it is in good state.
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 546 262 64..00..00..89..109
hdisk0 active 546 213 15..00..00..89..109
I think it is in good state.
ASKER
So should I remove the mirror copies for the LVs which are mirrored ?
Ok!
ASKER
There are some errors in errpt also related to disks. I am attaching my errpt logs. errpt-log.txt
ASKER
So, I have removed all the mirror copies of the LVs and ran synlvodm command. Now, what is the command to start the mirrorvg and run it in the back ground.
virgo
virgo
I'd be very careful in your situation. It seems that hdisk0 is going bad steadily.
First remove hdisk1 from rootvg, then add it again to see if this works smoothly.
reducevg rootvg hdisk1
extendvg rootvg hdisk1
Now try to mirror the LVs one by one instead of mirroring all at once. If this would finally succeed free up hdisk0 and have it replaced asap!
So start with
mklvcopy hd5 2 hdisk1
syncvg -v rootvg
and so on. Does syncvg fail somewhere (or hang)? If it hangs, check errpt again!
First remove hdisk1 from rootvg, then add it again to see if this works smoothly.
reducevg rootvg hdisk1
extendvg rootvg hdisk1
Now try to mirror the LVs one by one instead of mirroring all at once. If this would finally succeed free up hdisk0 and have it replaced asap!
So start with
mklvcopy hd5 2 hdisk1
syncvg -v rootvg
and so on. Does syncvg fail somewhere (or hang)? If it hangs, check errpt again!
ASKER
I am able to reducevg and extendvg properly and smoothly on hdisk1. I executed mirrorvg command again and now again it gave the same error :
0516-934 /usr/sbin/syncvg: Unable to synchronize logical volume hd5
The mirrorvg command is working in the background.
--virgo
0516-934 /usr/sbin/syncvg: Unable to synchronize logical volume hd5
The mirrorvg command is working in the background.
--virgo
OK, if you don't follow my suggestions ...
ASKER
Actually that was not the case, I thought of trying the mirrorvg command directly, so I executed that before your comment came, sorry for that. But what can be done in this case now.
virgo
virgo
Start over. Kill mirrorvg, run rmlvcopy if needed.
Then start creating mirrors. Don't start with hd5. We'll work on that one later.
Don't forget syncvg after each mklvcopy.
Then start creating mirrors. Don't start with hd5. We'll work on that one later.
Don't forget syncvg after each mklvcopy.
ASKER
The mirrorvg was completed successfully, but hd5 is still showing as stale. What can be done in this case.
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/stale N/A
hd6 paging 128 256 2 open/syncd N/A
impervalv jfs2 1 2 2 open/syncd /opt/imperva
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 10 20 2 open/syncd /
hd2 jfs 45 90 2 open/syncd /usr
hd9var jfs 12 24 2 open/syncd /var
hd3 jfs 32 64 2 open/syncd /tmp
hd1 jfs 32 64 2 open/syncd /home
hd10opt jfs 10 20 2 open/syncd /opt
usrlocallv jfs 10 20 2 open/syncd /usr/local
hd7 sysdump 48 48 1 open/syncd N/A
auditlv jfs 1 2 2 open/syncd /audit
optioimglv jfs 2 4 2 open/syncd /optio_images
virgo
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/stale N/A
hd6 paging 128 256 2 open/syncd N/A
impervalv jfs2 1 2 2 open/syncd /opt/imperva
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 10 20 2 open/syncd /
hd2 jfs 45 90 2 open/syncd /usr
hd9var jfs 12 24 2 open/syncd /var
hd3 jfs 32 64 2 open/syncd /tmp
hd1 jfs 32 64 2 open/syncd /home
hd10opt jfs 10 20 2 open/syncd /opt
usrlocallv jfs 10 20 2 open/syncd /usr/local
hd7 sysdump 48 48 1 open/syncd N/A
auditlv jfs 1 2 2 open/syncd /audit
optioimglv jfs 2 4 2 open/syncd /optio_images
virgo
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Again, it's late at night here.
Did you make some progress, or would you mind continuing tomorrow?
Did you make some progress, or would you mind continuing tomorrow?
OK, my day is over now.
Please take this issue seriously! You most probably have a corrupt boot LV, which means that you could not be able to reboot your machine as things stand at this very moment.
wmp
Please take this issue seriously! You most probably have a corrupt boot LV, which means that you could not be able to reboot your machine as things stand at this very moment.
wmp
ASKER
Ok, we can check this tomorrow. Thanks, for all your help.
Virgo
Virgo
ASKER
Hi wmp,
Should I go ahead and try the commands for hd5 give above ?
virgo
Should I go ahead and try the commands for hd5 give above ?
virgo
Yes, the stuff in comment #35750924 above.
I fear that bosboot will not work, but let's try it.
ASKER
snbc213:/# bosboot -a -d hdisk0
bosboot: Boot image is 40181 512 byte blocks.
I didn't got any error, it worked, should I continue to do it on hdisk1.
virgo
bosboot: Boot image is 40181 512 byte blocks.
I didn't got any error, it worked, should I continue to do it on hdisk1.
virgo
Is it still stale??
ASKER
Yes, lsvg -l rootvg is showing as:
hLV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/stale N/A
But I have run bosboot on hdisk1, do you want me to run it on hdisk1 and see if it works ? I did only first step you have given..i.e. bosboot on hdisk0 ?
virgo
hLV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/stale N/A
But I have run bosboot on hdisk1, do you want me to run it on hdisk1 and see if it works ? I did only first step you have given..i.e. bosboot on hdisk0 ?
virgo
bosboot on hdisk0 has overwritten hd5 on hdisk0 and thus should have forced a mirror write to hdisk1 which did not happen.
Running bosboot on hdisk1 would try to overwrite hd5 on hdisk1 and thus force a mirror write to hdisk0, but since I assume that the stale partition is on hdisk1 bosboot on hdisk1 will most probably not succeed.
Where is the stale partition? Check with "lspv hdisk0" and "lspv hdisk1".
Anyway, try bosboot on hdisk1. If it succeeds we could remove the mirror from hdisk0 and get a clean (although unmirrored) hd5 that way.
Running bosboot on hdisk1 would try to overwrite hd5 on hdisk1 and thus force a mirror write to hdisk0, but since I assume that the stale partition is on hdisk1 bosboot on hdisk1 will most probably not succeed.
Where is the stale partition? Check with "lspv hdisk0" and "lspv hdisk1".
Anyway, try bosboot on hdisk1. If it succeeds we could remove the mirror from hdisk0 and get a clean (although unmirrored) hd5 that way.
ASKER
Bosboot on hdisk1 succeeded :
servera:/# bosboot -a -d hdisk1
bosboot: Boot image is 40180 512 byte blocks
But lspv hdisk1 is still showing partition as stale, there are no stale partitions on hdisk0:
servera::/# lspv hdisk0
PHYSICAL VOLUME: hdisk0 VOLUME GROUP: rootvg
PV IDENTIFIER: 00003d5a49e74c08 VG IDENTIFIER 00003d5a00004c00000001218c d0c6a4
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 256 megabyte(s) LOGICAL VOLUMES: 14
TOTAL PPs: 546 (139776 megabytes) VG DESCRIPTORS: 2
FREE PPs: 213 (54528 megabytes) HOT SPARE: no
USED PPs: 333 (85248 megabytes) MAX REQUEST: 256 kilobytes
FREE DISTRIBUTION: 15..00..00..89..109
USED DISTRIBUTION: 95..109..109..20..00
servera:/# lspv hdisk1
PHYSICAL VOLUME: hdisk1 VOLUME GROUP: rootvg
PV IDENTIFIER: 00003d5a4a43e721 VG IDENTIFIER 00003d5a00004c00000001218c d0c6a4
PV STATE: active
STALE PARTITIONS: 1 ALLOCATABLE: yes
PP SIZE: 256 megabyte(s) LOGICAL VOLUMES: 13
TOTAL PPs: 546 (139776 megabytes) VG DESCRIPTORS: 1
FREE PPs: 261 (66816 megabytes) HOT SPARE: no
USED PPs: 285 (72960 megabytes) MAX REQUEST: 256 kilobytes
FREE DISTRIBUTION: 63..00..00..89..109
USED DISTRIBUTION: 47..109..109..20..00
servera:/# bosboot -a -d hdisk1
bosboot: Boot image is 40180 512 byte blocks
But lspv hdisk1 is still showing partition as stale, there are no stale partitions on hdisk0:
servera::/# lspv hdisk0
PHYSICAL VOLUME: hdisk0 VOLUME GROUP: rootvg
PV IDENTIFIER: 00003d5a49e74c08 VG IDENTIFIER 00003d5a00004c00000001218c
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 256 megabyte(s) LOGICAL VOLUMES: 14
TOTAL PPs: 546 (139776 megabytes) VG DESCRIPTORS: 2
FREE PPs: 213 (54528 megabytes) HOT SPARE: no
USED PPs: 333 (85248 megabytes) MAX REQUEST: 256 kilobytes
FREE DISTRIBUTION: 15..00..00..89..109
USED DISTRIBUTION: 95..109..109..20..00
servera:/# lspv hdisk1
PHYSICAL VOLUME: hdisk1 VOLUME GROUP: rootvg
PV IDENTIFIER: 00003d5a4a43e721 VG IDENTIFIER 00003d5a00004c00000001218c
PV STATE: active
STALE PARTITIONS: 1 ALLOCATABLE: yes
PP SIZE: 256 megabyte(s) LOGICAL VOLUMES: 13
TOTAL PPs: 546 (139776 megabytes) VG DESCRIPTORS: 1
FREE PPs: 261 (66816 megabytes) HOT SPARE: no
USED PPs: 285 (72960 megabytes) MAX REQUEST: 256 kilobytes
FREE DISTRIBUTION: 63..00..00..89..109
USED DISTRIBUTION: 47..109..109..20..00
That's not really understandable.
OK, we will try to synchronize hd5 alone.
Issue lslv hd5 . Take the long string beneath "LV IDENTIFIER:" and feed it into lresynclv:
lresynclv -l lvid
Example: lresynclv -l 00cfd23d00004c0000000108bf 1d209c.1
Any error messages?
OK, we will try to synchronize hd5 alone.
Issue lslv hd5 . Take the long string beneath "LV IDENTIFIER:" and feed it into lresynclv:
lresynclv -l lvid
Example: lresynclv -l 00cfd23d00004c0000000108bf
Any error messages?
ASKER
I ran that, but there are no error messages :
servera:/# lresynclv -l 00003d5a00004c00000001218c d0c6a4.1
servera:/#
Actually, I am confused now, which disks is having problem hdisk0 or hdisk1 ?
servera:/# lresynclv -l 00003d5a00004c00000001218c
servera:/#
Actually, I am confused now, which disks is having problem hdisk0 or hdisk1 ?
According to your errpt it's hdisk0 which has problems.
If lresynclv ran without errors it should have removed the "stale" state.
Is this the case?
If lresynclv ran without errors it should have removed the "stale" state.
Is this the case?
ASKER
No, still lspv hdisk1 is showing me 1 stale partition. Also :
lsvg -l rootvg is showing hd5 as closed/stale.
If hdisk0 is the problem, then why there are no stale partitions on hdisk0 ?
Should I do bosboot -a -d hdisk0 again, as we did it on hdisk1 and it must have overwritten hdisk0 boot record which was a good one ?
lsvg -l rootvg is showing hd5 as closed/stale.
If hdisk0 is the problem, then why there are no stale partitions on hdisk0 ?
Should I do bosboot -a -d hdisk0 again, as we did it on hdisk1 and it must have overwritten hdisk0 boot record which was a good one ?
hd5 is always closed, except during the boot process.
If hdisk0 is the problem, then why there are no stale partitions on hdisk0 ?
It looks as if LVM cannot read the data correctly from hdisk0 to write it to hdisk1, at least according to errpt.
No, playing around with bosboot will not change anything in regard to stale partitions, as we've seen above.
It actually seems we cannot work with hd5 on hdisk0. Something's wrong with that disk.
We should delete hd5 from both disks and recreate it on hdisk1 alone, so we will have at least one good copy of the boot LV.
rmlv -f hd5
mklv -y hd5 -t boot rootvg 1 hdisk1
bosboot -a -d hdisk1
bootlist -m normal hdisk1
savebase
Leave hdisk0 alone for now, and I'd suggest having IBM diagnose it as soon as possible!
wmp
If hdisk0 is the problem, then why there are no stale partitions on hdisk0 ?
It looks as if LVM cannot read the data correctly from hdisk0 to write it to hdisk1, at least according to errpt.
No, playing around with bosboot will not change anything in regard to stale partitions, as we've seen above.
It actually seems we cannot work with hd5 on hdisk0. Something's wrong with that disk.
We should delete hd5 from both disks and recreate it on hdisk1 alone, so we will have at least one good copy of the boot LV.
rmlv -f hd5
mklv -y hd5 -t boot rootvg 1 hdisk1
bosboot -a -d hdisk1
bootlist -m normal hdisk1
savebase
Leave hdisk0 alone for now, and I'd suggest having IBM diagnose it as soon as possible!
wmp
ASKER
Ok, I will open a call with IBM for the same.
ASKER
Ok, we have replaced hdisk0, but now when I am trying to mirror hdisk0, I am getting following errors but the mirrorvg command is running :
0516-934 /usr/sbin/syncvg: Unable to synchronize logical volume hd5.
0516-932 /usr/sbin/syncvg: Unable to synchronize volume group rootvg.
0516-1126 mirrorvg: rootvg successfully mirrored, user should perform
bosboot of system to initialize boot records. Then, user must modify
bootlist to include: hdisk0 hdisk1.
0516-1804 chvg: The quorum change takes effect immediately.
0516-934 /usr/sbin/syncvg: Unable to synchronize logical volume hd5.
0516-932 /usr/sbin/syncvg: Unable to synchronize volume group rootvg.
0516-1126 mirrorvg: rootvg successfully mirrored, user should perform
bosboot of system to initialize boot records. Then, user must modify
bootlist to include: hdisk0 hdisk1.
0516-1804 chvg: The quorum change takes effect immediately.
What is the exact state of your LVs at the moment? Everything mirrored, except for hd5?
Did you recreate hd5 from scratch, as I suggested?
If you didn't (why not?) - one possibility is that BB POLICY is set to non-relocatable, for what reasons ever, which might cause mirroring issues.
If it's really "non relocatable" change this setting with "chlv -b y hd5" and retry by issuing "varyonvg rootvg"
Did you recreate hd5 from scratch, as I suggested?
If you didn't (why not?) - one possibility is that BB POLICY is set to non-relocatable, for what reasons ever, which might cause mirroring issues.
If it's really "non relocatable" change this setting with "chlv -b y hd5" and retry by issuing "varyonvg rootvg"
ASKER
Now the state of all LVs is showing as syncd and the mirrorvg command also completed. I did the bosboot -a and no error was reported. Here is the output of lsvg -l rootvg:
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 128 256 2 open/syncd N/A
hd71 sysdump 48 48 1 open/syncd N/A
impervalv jfs2 1 2 2 open/syncd /opt/imperva
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 10 20 2 open/syncd /
hd2 jfs 45 90 2 open/syncd /usr
hd9var jfs 12 24 2 open/syncd /var
hd3 jfs 32 64 2 open/syncd /tmp
hd1 jfs 32 64 2 open/syncd /home
hd10opt jfs 10 20 2 open/syncd /opt
usrlocallv jfs 10 20 2 open/syncd /usr/local
hd7 sysdump 48 48 1 open/syncd N/A
auditlv jfs 1 2 2 open/syncd /audit
optioimglv jfs 2 4 2 open/syncd /optio_images
But i am worried why I got that error unable to synchronize volume group.
and yes I did removed hd5 and recreated it from scratch before.
virgo
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 128 256 2 open/syncd N/A
hd71 sysdump 48 48 1 open/syncd N/A
impervalv jfs2 1 2 2 open/syncd /opt/imperva
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 10 20 2 open/syncd /
hd2 jfs 45 90 2 open/syncd /usr
hd9var jfs 12 24 2 open/syncd /var
hd3 jfs 32 64 2 open/syncd /tmp
hd1 jfs 32 64 2 open/syncd /home
hd10opt jfs 10 20 2 open/syncd /opt
usrlocallv jfs 10 20 2 open/syncd /usr/local
hd7 sysdump 48 48 1 open/syncd N/A
auditlv jfs 1 2 2 open/syncd /audit
optioimglv jfs 2 4 2 open/syncd /optio_images
But i am worried why I got that error unable to synchronize volume group.
and yes I did removed hd5 and recreated it from scratch before.
virgo
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 128 256 2 open/syncd N/A
hd71 sysdump 48 48 1 open/syncd N/A
impervalv jfs2 1 2 2 open/syncd /opt/imperva
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 10 20 2 open/syncd /
hd2 jfs 45 90 2 open/syncd /usr
hd9var jfs 12 24 2 open/syncd /var
hd3 jfs 32 64 2 open/syncd /tmp
hd1 jfs 32 64 2 open/syncd /home
hd10opt jfs 10 20 2 open/syncd /opt
usrlocallv jfs 10 20 2 open/syncd /usr/local
hd7 sysdump 48 48 1 open/syncd N/A
auditlv jfs 1 2 2 open/syncd /audit
optioimglv jfs 2 4 2 open/syncd /optio_images
Did you run a varyonvg or syncvg after mirrorvg by any chance? This could have caused hd5 to finally synchronize.
I can' think of another possibility at the moment, but who knows!
Did you check BB relocation of hd5 before recreating it (lslv hd5)? Just from curiosity ...
Anyway, no reason to be worried anymore. It's all fine now. Congrats!
wmp
I can' think of another possibility at the moment, but who knows!
Did you check BB relocation of hd5 before recreating it (lslv hd5)? Just from curiosity ...
Anyway, no reason to be worried anymore. It's all fine now. Congrats!
wmp
ASKER
When the IBM engineer tried to hot plug remove hdisk0, it was giving error related /dev/ipldevice. Then I did various juglaries to get it removed. Then new disk was inserted and I started the mirrorvg command which gave unable to synchronize error.
I did not run varyonvg or syncvg after the mirror is completed. Also, before removing hd5, I removed hdisk0 from rootvg when the engineer came in. Then after that, I removed hd5 from hdisk1 , recreated it and done bosboot. I didn't check for BB relocation at that time
As of now, the mirrors are looking good. Thanks very much for helping me out of this issues.
Virgo
I did not run varyonvg or syncvg after the mirror is completed. Also, before removing hd5, I removed hdisk0 from rootvg when the engineer came in. Then after that, I removed hd5 from hdisk1 , recreated it and done bosboot. I didn't check for BB relocation at that time
As of now, the mirrors are looking good. Thanks very much for helping me out of this issues.
Virgo
are the disks in a good state? (lsvg -L -p rootvg)
You can kill mirrorvg, remove all LV copies from hdisk1 which might already be present ("rmlvcopy lvname 1 hdisk1"), then do "reducevg rootvg hdisk1".
Maybe rootvg stays locked after killing mirrorvg. Issue "varyonvg -b -u rootvg" in this case.
What's the state of your LVs now? There shouldn't be any stale partitions, because rmlvcopy should not have worked in such a case.
If all seems OK issue "synclvodm -P rootvg".
Now you can try to extendvg again and retry mirrorvg.
If anything in the above process is not as expected please let me know.
Maybe we'll have to recreate hd5, but that's not a big problem. First try "bosboot -a -d hdisk0". If this fails, do "rmlv -f hd5", "mklv -y hd5 -t boot rootvg 1 hdisk0", then retry "bosboot -a -d hdisk0"
Attention: NEVER try to reboot if you suspect that hd5 is corrupt!!
wmp