Link to home
Start Free TrialLog in
Avatar of virgo0880
virgo0880

asked on

hd5 volume sync problem

My hd5 volume is giving problem saying unable to sync the volume. For this, I unmirrored hdisk1 from rootvg and started the mirror again still gave the same problem. But now the issue is, i think mirrorvg is hanged and it is stuck at hd3 syncing. The mirrorvg seems to be hanged, can i kill the process, remove the hdisk1 from rootvg ? what can be done in this case. When I do "lslv -L hd3", it is showing 16 stale PPs  since long time, kindly help in this issue.

Thanks
virgo
Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

Hi,

are the disks in a good state? (lsvg -L -p rootvg)

You can kill mirrorvg, remove all LV copies from hdisk1 which might already be present ("rmlvcopy lvname 1 hdisk1"), then do "reducevg rootvg hdisk1".

Maybe rootvg stays locked after killing mirrorvg. Issue "varyonvg -b -u rootvg" in this case.

What's the state of your LVs now? There shouldn't be any stale partitions, because rmlvcopy should not have worked in such a case.

If all seems OK issue "synclvodm -P rootvg".

Now you can try to extendvg again and retry mirrorvg.

If anything in the above process is not as expected please let me know.

Maybe we'll have to recreate hd5, but that's not a big problem. First try "bosboot -a -d hdisk0". If this fails, do "rmlv -f hd5", "mklv -y hd5 -t boot rootvg 1 hdisk0", then retry "bosboot -a -d hdisk0"

Attention: NEVER try to reboot if you suspect that hd5 is corrupt!!

wmp
Avatar of virgo0880
virgo0880

ASKER

I killed mirrorvg command , now lsvg -l rootvg shows :

rootvg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
hd5                 boot       1       1       1    closed/syncd  N/A
hd6                 paging     128     256     2    open/syncd    N/A
impervalv           jfs2       1       2       2    open/syncd    /opt/imperva
hd8                 jfslog     1       2       2    open/syncd    N/A
hd4                 jfs        10      20      2    open/syncd    /
hd2                 jfs        45      90      2    open/syncd    /usr
hd9var              jfs        12      24      2    open/syncd    /var
hd3                 jfs        32      64      2    open/stale    /tmp
hd1                 jfs        32      64      2    open/stale    /home
hd10opt             jfs        10      20      2    open/stale    /opt
usrlocallv          jfs        10      20      2    open/stale    /usr/local
hd7                 sysdump    48      48      1    open/syncd    N/A
auditlv             jfs        1       2       2    open/stale    /audit
optioimglv          jfs        2       4       2    open/stale    /optio_images

So, should I go ahead and remove all the mirror copies of the LVs, i should use following commands right :

rmlvcopy hd5 1 hdisk1

similarly for all LVs...is that right..?

Yes, nearly correct.

Apply rmlvcopy only to mirrored LVs, so don't run it against hd5 and hd7!

Let's hope that all the "good" copies are on hdisk0.

But FIRST, are the two disks in a good state? lsvg -p rootvg ?


rootvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk1            active            546         262         64..00..00..89..109
hdisk0            active            546         213         15..00..00..89..109

I think it is in good state.
So should I remove the mirror copies for the LVs which are mirrored ?
There are some errors in errpt also related to disks. I am attaching my errpt logs. errpt-log.txt
So, I have removed all the mirror copies of the LVs and ran synlvodm command. Now, what is the command to start the mirrorvg and run it in the back ground.

virgo
I'd be very careful in your situation. It seems that hdisk0 is going bad steadily.

First remove hdisk1 from rootvg, then add it again to see if this works smoothly.

reducevg rootvg hdisk1
extendvg rootvg hdisk1

Now try to mirror the LVs one by one instead of mirroring all at once. If this would finally succeed free up hdisk0 and have it replaced asap!

So start with

mklvcopy hd5 2 hdisk1
syncvg -v rootvg


and so on. Does syncvg fail somewhere (or hang)? If it hangs, check errpt again!
I am able to reducevg and extendvg properly and smoothly on hdisk1. I executed mirrorvg command again and now again it gave the same error :

0516-934 /usr/sbin/syncvg: Unable to synchronize logical volume hd5

The mirrorvg command is working in the background.

--virgo
OK, if you don't follow my suggestions ...
Actually that was not the case, I thought of trying the mirrorvg command directly, so I executed that before your comment came, sorry for that. But what can be done in this case now.

virgo
Start over. Kill mirrorvg, run rmlvcopy if needed.
Then start creating mirrors. Don't start with hd5. We'll work on that one later.
Don't forget syncvg after each mklvcopy.
The mirrorvg was completed successfully, but hd5 is still showing as stale. What can be done in this case.

rootvg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
hd5                 boot       1       2       2    closed/stale  N/A
hd6                 paging     128     256     2    open/syncd    N/A
impervalv           jfs2       1       2       2    open/syncd    /opt/imperva
hd8                 jfslog     1       2       2    open/syncd    N/A
hd4                 jfs        10      20      2    open/syncd    /
hd2                 jfs        45      90      2    open/syncd    /usr
hd9var              jfs        12      24      2    open/syncd    /var
hd3                 jfs        32      64      2    open/syncd    /tmp
hd1                 jfs        32      64      2    open/syncd    /home
hd10opt             jfs        10      20      2    open/syncd    /opt
usrlocallv          jfs        10      20      2    open/syncd    /usr/local
hd7                 sysdump    48      48      1    open/syncd    N/A
auditlv             jfs        1       2       2    open/syncd    /audit
optioimglv          jfs        2       4       2    open/syncd    /optio_images


virgo
ASKER CERTIFIED SOLUTION
Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Again, it's late at night here.

Did you make some progress, or would you mind continuing tomorrow?
OK, my day is over now.

Please take this issue seriously! You most probably have a corrupt boot LV, which means that you could not be able to reboot your machine as things stand at this very moment.

wmp
Ok, we can check this tomorrow. Thanks, for all your help.

Virgo
Hi wmp,

Should I go ahead and try the commands for hd5 give above ?

virgo


Yes, the stuff in comment #35750924 above.

I fear that bosboot will not work, but let's try it.


snbc213:/# bosboot -a -d hdisk0

bosboot: Boot image is 40181 512 byte blocks.

I didn't got any error, it worked, should I continue to do it on hdisk1.

virgo
Is it still stale??
Yes, lsvg -l rootvg is showing as:

hLV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
hd5                 boot       1       2       2    closed/stale  N/A

But I have run bosboot on hdisk1, do you want me to run it on hdisk1 and see if it works ? I did only first step you have given..i.e. bosboot on hdisk0 ?

virgo
bosboot on hdisk0 has overwritten hd5 on hdisk0 and thus should have  forced a mirror write to hdisk1 which did not happen.

Running bosboot on hdisk1 would try to overwrite hd5 on hdisk1 and thus force a mirror write to hdisk0, but since I assume that the stale partition is on hdisk1 bosboot on hdisk1 will most probably not succeed.

Where is the stale partition?  Check with "lspv hdisk0" and "lspv hdisk1".

Anyway, try bosboot on hdisk1. If it succeeds we could remove the mirror from hdisk0 and get a clean (although unmirrored) hd5 that way.

Bosboot on hdisk1 succeeded :

servera:/# bosboot -a -d hdisk1

bosboot: Boot image is 40180 512 byte blocks

But lspv hdisk1 is still showing partition as stale, there are no stale partitions on hdisk0:

servera::/# lspv hdisk0
PHYSICAL VOLUME:    hdisk0                   VOLUME GROUP:     rootvg
PV IDENTIFIER:      00003d5a49e74c08 VG IDENTIFIER     00003d5a00004c00000001218cd0c6a4
PV STATE:           active
STALE PARTITIONS:   0                        ALLOCATABLE:      yes
PP SIZE:            256 megabyte(s)          LOGICAL VOLUMES:  14
TOTAL PPs:          546 (139776 megabytes)   VG DESCRIPTORS:   2
FREE PPs:           213 (54528 megabytes)    HOT SPARE:        no
USED PPs:           333 (85248 megabytes)    MAX REQUEST:      256 kilobytes
FREE DISTRIBUTION:  15..00..00..89..109
USED DISTRIBUTION:  95..109..109..20..00

servera:/# lspv hdisk1
PHYSICAL VOLUME:    hdisk1                   VOLUME GROUP:     rootvg
PV IDENTIFIER:      00003d5a4a43e721 VG IDENTIFIER     00003d5a00004c00000001218cd0c6a4
PV STATE:           active
STALE PARTITIONS:   1                        ALLOCATABLE:      yes
PP SIZE:            256 megabyte(s)          LOGICAL VOLUMES:  13
TOTAL PPs:          546 (139776 megabytes)   VG DESCRIPTORS:   1
FREE PPs:           261 (66816 megabytes)    HOT SPARE:        no
USED PPs:           285 (72960 megabytes)    MAX REQUEST:      256 kilobytes
FREE DISTRIBUTION:  63..00..00..89..109
USED DISTRIBUTION:  47..109..109..20..00
That's not really understandable.

OK, we will try to synchronize hd5 alone.

Issue lslv hd5 . Take the long string beneath "LV IDENTIFIER:" and feed it into lresynclv:

lresynclv -l lvid

Example: lresynclv -l 00cfd23d00004c0000000108bf1d209c.1

Any error messages?

I ran that, but there are no error messages :

servera:/# lresynclv -l 00003d5a00004c00000001218cd0c6a4.1
servera:/#

Actually, I am confused now, which disks is having problem hdisk0 or hdisk1 ?
According to your errpt it's hdisk0 which has problems.

If lresynclv ran without errors it should have removed the "stale" state.

Is this the case?
No, still lspv hdisk1 is showing me 1 stale partition. Also :

lsvg -l rootvg is showing hd5 as closed/stale.

If hdisk0 is the problem, then why there are no stale partitions on hdisk0 ?

Should I do bosboot -a -d hdisk0 again, as we did it on hdisk1 and it must have overwritten hdisk0 boot record which was a good one ?

hd5 is always closed, except during the boot process.

If hdisk0 is the problem, then why there are no stale partitions on hdisk0 ?

It looks as if LVM cannot read the data correctly from hdisk0 to write it to hdisk1, at least according to errpt.

No, playing around with bosboot will not change anything in regard to stale partitions, as we've seen above.


It actually seems we cannot work with hd5 on hdisk0. Something's wrong with that disk.

We should delete hd5 from both disks and recreate it on hdisk1 alone, so we will have at least one good copy of the boot LV.

rmlv -f hd5

mklv -y hd5 -t boot rootvg 1 hdisk1

bosboot -a -d hdisk1

bootlist -m normal hdisk1

savebase

Leave hdisk0 alone for now, and I'd suggest having IBM diagnose it as soon as possible!

wmp



Ok, I will open a call with IBM for the same.
Ok, we have replaced hdisk0, but now when I am trying to mirror hdisk0, I am getting following errors but the mirrorvg command is running :

0516-934 /usr/sbin/syncvg: Unable to synchronize logical volume hd5.
0516-932 /usr/sbin/syncvg: Unable to synchronize volume group rootvg.
0516-1126 mirrorvg: rootvg successfully mirrored, user should perform
        bosboot of system to initialize boot records.  Then, user must modify
        bootlist to include:  hdisk0 hdisk1.
0516-1804 chvg: The quorum change takes effect immediately.
What is the exact state of your LVs at the moment? Everything mirrored, except for hd5?

Did you recreate hd5 from scratch, as I suggested?

If you didn't (why not?) - one possibility is that  BB POLICY is set to non-relocatable, for what reasons ever, which might cause mirroring issues.
If it's really "non relocatable" change this setting with "chlv -b y hd5" and retry by issuing "varyonvg rootvg"

Now the state of all LVs is showing as syncd and the mirrorvg command also completed. I did the bosboot -a and no error was reported. Here is the output of lsvg -l rootvg:

rootvg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
hd5                 boot       1       2       2    closed/syncd  N/A
hd6                 paging     128     256     2    open/syncd    N/A
hd71                sysdump    48      48      1    open/syncd    N/A
impervalv           jfs2       1       2       2    open/syncd    /opt/imperva
hd8                 jfslog     1       2       2    open/syncd    N/A
hd4                 jfs        10      20      2    open/syncd    /
hd2                 jfs        45      90      2    open/syncd    /usr
hd9var              jfs        12      24      2    open/syncd    /var
hd3                 jfs        32      64      2    open/syncd    /tmp
hd1                 jfs        32      64      2    open/syncd    /home
hd10opt             jfs        10      20      2    open/syncd    /opt
usrlocallv          jfs        10      20      2    open/syncd    /usr/local
hd7                 sysdump    48      48      1    open/syncd    N/A
auditlv             jfs        1       2       2    open/syncd    /audit
optioimglv          jfs        2       4       2    open/syncd    /optio_images

But i am worried why I got that error unable to synchronize volume group.

and yes I did removed hd5 and recreated it from scratch before.

virgo

 
rootvg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
hd5                 boot       1       2       2    closed/syncd  N/A
hd6                 paging     128     256     2    open/syncd    N/A
hd71                sysdump    48      48      1    open/syncd    N/A
impervalv           jfs2       1       2       2    open/syncd    /opt/imperva
hd8                 jfslog     1       2       2    open/syncd    N/A
hd4                 jfs        10      20      2    open/syncd    /
hd2                 jfs        45      90      2    open/syncd    /usr
hd9var              jfs        12      24      2    open/syncd    /var
hd3                 jfs        32      64      2    open/syncd    /tmp
hd1                 jfs        32      64      2    open/syncd    /home
hd10opt             jfs        10      20      2    open/syncd    /opt
usrlocallv          jfs        10      20      2    open/syncd    /usr/local
hd7                 sysdump    48      48      1    open/syncd    N/A
auditlv             jfs        1       2       2    open/syncd    /audit
optioimglv          jfs        2       4       2    open/syncd    /optio_images

Open in new window

Did you run a varyonvg or syncvg after mirrorvg by any chance? This could have caused hd5 to finally synchronize.

I can' think of another possibility at the moment, but who knows!

Did you check BB relocation of hd5 before recreating it (lslv hd5)? Just from curiosity ...

Anyway, no reason to be worried anymore. It's all fine now. Congrats!

wmp



When the IBM engineer tried to hot plug remove hdisk0, it was giving error related /dev/ipldevice. Then I did various juglaries to get it removed. Then new disk was inserted and I started the mirrorvg command which gave unable to synchronize error.

I did not run varyonvg or syncvg after the mirror is completed. Also, before removing hd5, I removed hdisk0 from rootvg when the engineer came in. Then after that, I removed hd5 from hdisk1 , recreated it and done bosboot. I didn't check for BB relocation at that time

As of now, the mirrors are looking good. Thanks very much for helping me out of this issues.

Virgo