Start Free Trial

asked on

AIX 5.1 - remove drive from volume group so it can be replaced

we are getting some disk operation errors

The hardware guy says he needs the disk removed from the volume group before he can replace it.

Then of course it will need to be added back in after it is replaced.

I think the discs are mirrored - if that makes a difference.

Hdisk 4 is the one with errors

ibm1:/> lspv
hdisk0          00015051814ca2c5                    rootvg
hdisk1          000150514226fc44                    usr1vg
hdisk2          000150519965a2bb                    usr1vg
hdisk3          0001505115c7dbce                    usr1vg
hdisk4          000c925d02a3b3b2                    usr1vg
hdisk5          000c925d822f5eda                    usr1vg
hdisk6          00011784d15410dc                    rootvg
hdisk7          000150512ffa6367                    usr1vg
hdisk8          000150519965a4eb                    usr1vg
hdisk9          000150512b7bdb40                    usr1vg
hdisk10         000c925d02a3a8c3                    usr1vg
hdisk11         000c924d87206941                    usr1vg
ibm1:/>
ibm1:/> lsvg -l usr1vg
usr1vg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
loglv00             jfslog     16    32    4    open/syncd    N/A
usr1                jfs        3943  7886  10   open/syncd    /usr1
ibm1:/>

Open in new window

Yes, the LVs in usr1vg are mirrored.

Are you able to add a new disk before actually removing the old one?
This is generally no problem with SAN disks.

If you are, just

- add a new disk
- run "cfgmgr"
- find out the name of the new disk (I'll call it hdiskx below)
- run "replacepv hdisk4 hdiskx"

When finished you can safely remove hdisk4, because it's empty now and no longer part of usr1vg.

Now run "rmdev -dl hdisk4" and that's all.

If there is no possibility to make an additional disk available, we must know some more details. Please post the output of:

lspv -l hdisk4
lsvg -p usr1vg
lslv -l usr1

wmp

ASKER

I dont think that is a possibility.

Thanks for your help!! IBM is no help since we are on 5.1!!

ibm1:/> lspv -l hdisk4
hdisk4:
LV NAME               LPs   PPs   DISTRIBUTION          MOUNT POINT
usr1                  1084  1084  217..217..216..217..217 /usr1
ibm1:/> lsvg -p usr1vg
usr1vg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk2            active            542         0           00..00..00..00..00
hdisk8            active            542         0           00..00..00..00..00
hdisk1            active            542         0           00..00..00..00..00
hdisk9            active            1084        0           00..00..00..00..00
hdisk3            active            1084        0           00..00..00..00..00
hdisk7            active            542         0           00..00..00..00..00
hdisk4            active            1084        0           00..00..00..00..00
hdisk5            active            1084        0           00..00..00..00..00
hdisk10           active            1084        467         33..00..00..217..217
hdisk11           active            1084        287         00..00..00..70..217
ibm1:/> lslv -l usr1
usr1:/usr1
PV                COPIES        IN BAND       DISTRIBUTION
hdisk3            1084:000:000  20%           217:217:216:217:217
hdisk9            1083:000:000  19%           217:216:216:217:217
hdisk7            542:000:000   19%           109:108:108:108:109
hdisk2            542:000:000   19%           109:108:108:108:109
hdisk8            541:000:000   19%           109:107:108:108:109
hdisk1            542:000:000   19%           109:108:108:108:109
hdisk5            1084:000:000  20%           217:217:216:217:217
hdisk4            1084:000:000  20%           217:217:216:217:217
hdisk11           782:000:000   27%           217:217:216:132:000
hdisk10           602:000:000   36%           169:217:216:000:000
ibm1:/>

Open in new window

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Thanks! I will be attempting this between 1-2pm Eastern tomorrow. If you can respond quickly then -- all the better :)

BTW -- is this going to reduce the volume size in the interim?

No, the available space will not be reduced, but there will be no redundancy (mirroring) for the usr1 logical volume between "rmlvcopy" and "mklvcopy".

If everything works as expected you should run "syncvg -v usr1vg" after "mklvcopy" to make sure that all copies (mirrors) are in sync again.

ASKER

thanks!! this seemed to mostly work except the last command failed.

Any suggestions?

ibm1:/> lspv
hdisk0          00015051814ca2c5                    rootvg
hdisk1          000150514226fc44                    usr1vg
hdisk2          000150519965a2bb                    usr1vg
hdisk3          0001505115c7dbce                    usr1vg
hdisk4          none                                None
hdisk5          000c925d822f5eda                    usr1vg
hdisk6          00011784d15410dc                    rootvg
hdisk7          000150512ffa6367                    usr1vg
hdisk8          000150519965a4eb                    usr1vg
hdisk9          000150512b7bdb40                    usr1vg
hdisk10         000c925d02a3a8c3                    usr1vg
hdisk11         000c924d87206941                    usr1vg
ibm1:/> extendvg usr1vg hdisk4
0516-1254 extendvg: Changing the PVID in the ODM.
ibm1:/> lspv
hdisk0          00015051814ca2c5                    rootvg
hdisk1          000150514226fc44                    usr1vg
hdisk2          000150519965a2bb                    usr1vg
hdisk3          0001505115c7dbce                    usr1vg
hdisk4          00015051e2e7d077                    usr1vg
hdisk5          000c925d822f5eda                    usr1vg
hdisk6          00011784d15410dc                    rootvg
hdisk7          000150512ffa6367                    usr1vg
hdisk8          000150519965a4eb                    usr1vg
hdisk9          000150512b7bdb40                    usr1vg
hdisk10         000c925d02a3a8c3                    usr1vg
hdisk11         000c924d87206941                    usr1vg
ibm1:/> mklvcopy usr1 2 hdisk4
0516-404 allocp: This system cannot fulfill the allocation request.
        There are not enough free partitions or not enough physical volumes
        to keep strictness and satisfy allocation requests.  The command
        should be retried with different allocation characteristics.
ibm1:/>

Open in new window

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

I'm not sure if I have the described requirement....

When you say two different locations do you mean like different servers? Internal/external?

I can tell you that all the drives are internal to a single server.

Does that mean I don't have the described requirement?

Just for good measure here are those outputs anyway

THANKS SO MUCH FOR YOUR HELP!!

ibm1:/> lsvg -p usr1vg
usr1vg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk2            active            542         0           00..00..00..00..00
hdisk8            active            542         0           00..00..00..00..00
hdisk1            active            542         482         109..108..108..108..49
hdisk9            active            1084        1083        217..216..216..217..217
hdisk3            active            1084        0           00..00..00..00..00
hdisk7            active            542         542         109..108..108..108..109
hdisk4            active            1084        1084        217..217..216..217..217
hdisk5            active            1084        120         00..120..00..00..00
hdisk10           active            1084        927         202..75..216..217..217
hdisk11           active            1084        459         00..67..105..70..217
ibm1:/> lscfg |grep hdisk
+ hdisk0            10-60-00-8,0      16 Bit LVD SCSI Disk Drive (4500 MB)
+ hdisk1            10-60-00-9,0      16 Bit LVD SCSI Disk Drive (9100 MB)
+ hdisk2            10-60-00-10,0     16 Bit LVD SCSI Disk Drive (9100 MB)
+ hdisk3            10-60-00-11,0     16 Bit LVD SCSI Disk Drive (18200
+ hdisk4            10-60-00-12,0     16 Bit LVD SCSI Disk Drive (18200
+ hdisk5            10-60-00-13,0     16 Bit LVD SCSI Disk Drive (18200
+ hdisk6            10-88-00-8,0      16 Bit SCSI Disk Drive (4500 MB)
+ hdisk7            10-88-00-9,0      16 Bit LVD SCSI Disk Drive (9100 MB)
+ hdisk8            10-88-00-10,0     16 Bit LVD SCSI Disk Drive (9100 MB)
+ hdisk9            10-88-00-11,0     16 Bit LVD SCSI Disk Drive (18200
+ hdisk10           10-88-00-12,0     16 Bit LVD SCSI Disk Drive (18200
+ hdisk11           10-88-00-13,0     16 Bit LVD SCSI Disk Drive (18200
ibm1:/>

Open in new window

Your drives are behind two different SCSI controllers, that's all.

Regarding the lsvg output your data placement policy doesn't seem to be strict in any way, so you obviously don't have the mentioned requirement.

mklvcopy usr1 2
syncvg -v usr1vg

should work just fine for you then.

ASKER

Thanks!

I got this...

ibm1:/> mklvcopy usr1 2
0516-404 allocp: This system cannot fulfill the allocation request.
        There are not enough free partitions or not enough physical volumes
        to keep strictness and satisfy allocation requests.  The command
        should be retried with different allocation characteristics.

Open in new window

Strange.

I'll need some more output:

lsvg usr1vg
lslv usr1
lslv -l usr1

ASKER

Here you go.

Thanks!!

ibm1:/> lsvg usr1vg
VOLUME GROUP:   usr1vg                   VG IDENTIFIER:  00015051a5299fdf
VG STATE:       active                   PP SIZE:        16 megabyte(s)
VG PERMISSION:  read/write               TOTAL PPs:      8672 (138752 megabytes)
MAX LVs:        256                      FREE PPs:       4697 (75152 megabytes)
LVs:            2                        USED PPs:       3975 (63600 megabytes)
OPEN LVs:       2                        QUORUM:         6
TOTAL PVs:      10                       VG DESCRIPTORS: 10
STALE PVs:      0                        STALE PPs:      0
ACTIVE PVs:     10                       AUTO ON:        yes
MAX PPs per PV: 2032                     MAX PVs:        16
LTG size:       128 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:      no

Open in new window

ibm1:/> lslv usr1
LOGICAL VOLUME:     usr1                   VOLUME GROUP:   usr1vg
LV IDENTIFIER:      00015051a5299fdf.2     PERMISSION:     read/write
VG STATE:           active/complete        LV STATE:       opened/syncd
TYPE:               jfs                    WRITE VERIFY:   off
MAX LPs:            7000                   PP SIZE:        16 megabyte(s)
COPIES:             1                      SCHED POLICY:   parallel
LPs:                3943                   PPs:            3943
STALE PPs:          0                      BB POLICY:      relocatable
INTER-POLICY:       minimum                RELOCATABLE:    yes
INTRA-POLICY:       middle                 UPPER BOUND:    32
MOUNT POINT:        /usr1                  LABEL:          /usr1
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes

Open in new window

ibm1:/> lslv -l usr1
usr1:/usr1
PV                COPIES        IN BAND       DISTRIBUTION
hdisk3            1084:000:000  20%           217:217:216:217:217
hdisk2            542:000:000   19%           109:108:108:108:109
hdisk8            541:000:000   19%           109:107:108:108:109
hdisk1            060:000:000   0%            000:000:000:000:060
hdisk5            964:000:000   10%           217:097:216:217:217
hdisk11           610:000:000   24%           217:150:111:132:000
hdisk10           142:000:000   100%          000:142:000:000:000
ibm1:/>

Open in new window

OK, I'll have to analyze this. Please give me some time until tomorrow.

ASKER

sure -- thanks so much for your help!! don't know what we would do without your guru input!

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

The second option... did you mean to say it CAN be done without disrupting? Or it CANT?

Either way -- I think we want to go the safe route.... We will probably only use the server for six more months, (half of our stuff is already moved off) but it is crucial

will this reduce the size of usr1?

THANKS!

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

To your last questions:

- Everything I wrote can be done without disruption, except for the last thing - the varyoffvg/varyonvg or reboot required to make the new quorum checking option effective.

- Volume sizes are never affected by such operations. We still have all primary copies intact and complete, and this will not change.

ASKER

Thanks!! you are the best!! almost ready to run these commands...

mklvcopy -s s usr1 2 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11
syncvg -v usr1vg # This can take quite a long time!

on that last one do I include the # or do I just run..
syncvg -v usr1vg

The # and all which follows are just a comment of mine.

Since the "#" indicates the start of a comment you could have included it, it wouldn't have done any harm ....

The same is true here:

chvg -Q n usr1vg # Turn quorum checking off
savebase # Save changes to boot image

ASKER

Thanks! So lslv did infact just show disks 1-5 so I am running those next two commands now.

mklvcopy -s s usr1 2 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11
syncvg -v usr1vg # This can take quite a long time!

Then after that I just need to run...
chvg -Q n usr1vg # Turn quorum checking off
savebase # Save changes to boot image

and then I can just reboot -- right?

(are there any checks I can do before the reboot to make sure it will come back up OK? I am paranoid about reboots since We had a big problem a long time ago when it got stuck during boot and IBM support couldn't even help us... (this was back when they still supported 5.1 too)) We ended up having to restore from a tape.

ASKER

shoot... I just got this error

ibm1:/> lslv -l usr1
usr1:/usr1
PV                COPIES        IN BAND       DISTRIBUTION
hdisk3            1084:000:000  20%           217:217:216:217:217
hdisk2            542:000:000   19%           109:108:108:108:109
hdisk4            1084:000:000  20%           217:217:216:217:217
hdisk1            269:000:000   40%           000:108:101:000:060
hdisk5            964:000:000   10%           217:097:216:217:217
ibm1:/> mklvcopy -s s usr1 2 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11
0516-404 allocp: This system cannot fulfill the allocation request.
        There are not enough free partitions or not enough physical volumes
        to keep strictness and satisfy allocation requests.  The command
        should be retried with different allocation characteristics.
ibm1:/>

Open in new window

ASKER

here are these outputs if you need them

ibm1:/>
ibm1:/> lsvg usr1vg
VOLUME GROUP:   usr1vg                   VG IDENTIFIER:  00015051a5299fdf
VG STATE:       active                   PP SIZE:        16 megabyte(s)
VG PERMISSION:  read/write               TOTAL PPs:      8672 (138752 megabytes)
MAX LVs:        256                      FREE PPs:       4697 (75152 megabytes)
LVs:            2                        USED PPs:       3975 (63600 megabytes)
OPEN LVs:       2                        QUORUM:         6
TOTAL PVs:      10                       VG DESCRIPTORS: 10
STALE PVs:      0                        STALE PPs:      0
ACTIVE PVs:     10                       AUTO ON:        yes
MAX PPs per PV: 2032                     MAX PVs:        16
LTG size:       128 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:      no
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/> lslv usr1
LOGICAL VOLUME:     usr1                   VOLUME GROUP:   usr1vg
LV IDENTIFIER:      00015051a5299fdf.2     PERMISSION:     read/write
VG STATE:           active/complete        LV STATE:       opened/syncd
TYPE:               jfs                    WRITE VERIFY:   off
MAX LPs:            7000                   PP SIZE:        16 megabyte(s)
COPIES:             1                      SCHED POLICY:   parallel
LPs:                3943                   PPs:            3943
STALE PPs:          0                      BB POLICY:      relocatable
INTER-POLICY:       minimum                RELOCATABLE:    yes
INTRA-POLICY:       middle                 UPPER BOUND:    32
MOUNT POINT:        /usr1                  LABEL:          /usr1
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
ibm1:/>
ibm1:/>
ibm1:/>

Open in new window

Seems that the partitions of loglv00 are also kind of misplaced, which inhibits our desired superstrict allocation.

So we will clean up this one as well.

Please post

lslv loglv00
lslv -l loglv00

ASKER

Here you go

ibm1:/>
ibm1:/>
ibm1:/> lslv loglv00
LOGICAL VOLUME:     loglv00                VOLUME GROUP:   usr1vg
LV IDENTIFIER:      00015051a5299fdf.1     PERMISSION:     read/write
VG STATE:           active/complete        LV STATE:       opened/syncd
TYPE:               jfslog                 WRITE VERIFY:   off
MAX LPs:            512                    PP SIZE:        16 megabyte(s)
COPIES:             2                      SCHED POLICY:   parallel
LPs:                16                     PPs:            32
STALE PPs:          0                      BB POLICY:      relocatable
INTER-POLICY:       minimum                RELOCATABLE:    yes
INTRA-POLICY:       middle                 UPPER BOUND:    32
MOUNT POINT:        N/A                    LABEL:          None
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/> lslv -l loglv00
loglv00:N/A
PV                COPIES        IN BAND       DISTRIBUTION
hdisk8            001:000:000   100%          000:001:000:000:000
hdisk9            001:000:000   100%          000:001:000:000:000
hdisk10           015:000:000   0%            015:000:000:000:000
hdisk11           015:000:000   0%            000:000:000:015:000
ibm1:/>
ibm1:/>
ibm1:/>

Open in new window

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Thanks!!

So sorry to be such a pain!! but I just got this...

ibm1:/> mklvcopy -s s loglv00 2 hdisk1
0516-404 allocp: This system cannot fulfill the allocation request.
        There are not enough free partitions or not enough physical volumes
        to keep strictness and satisfy allocation requests.  The command
        should be retried with different allocation characteristics.
ibm1:/>

Open in new window

ASKER

in case you need

ibm1:/> lslv loglv00
LOGICAL VOLUME:     loglv00                VOLUME GROUP:   usr1vg
LV IDENTIFIER:      00015051a5299fdf.1     PERMISSION:     read/write
VG STATE:           active/complete        LV STATE:       opened/syncd
TYPE:               jfslog                 WRITE VERIFY:   off
MAX LPs:            512                    PP SIZE:        16 megabyte(s)
COPIES:             1                      SCHED POLICY:   parallel
LPs:                16                     PPs:            16
STALE PPs:          0                      BB POLICY:      relocatable
INTER-POLICY:       minimum                RELOCATABLE:    yes
INTRA-POLICY:       middle                 UPPER BOUND:    32
MOUNT POINT:        N/A                    LABEL:          None
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/> lslv -l loglv00
loglv00:N/A
PV                COPIES        IN BAND       DISTRIBUTION
hdisk10           016:000:000   6%            015:001:000:000:000
ibm1:/>

Open in new window

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Well on the one hand I have your advice, and on the other hand I have.... well... nothing... I'd just about be screwed if it weren't for you. :) I'll take you memory in any shape I can get it :)

Here is the latest error... and some outputs in case you need

ibm1:/> migratepv -l loglv00 hdisk10 hdisk1
ibm1:/> mklvcopy -s s loglv00 2 hdisk10
0516-404 allocp: This system cannot fulfill the allocation request.
        There are not enough free partitions or not enough physical volumes
        to keep strictness and satisfy allocation requests.  The command
        should be retried with different allocation characteristics.
ibm1:/>
ibm1:/>
ibm1:/> lslv loglv00
LOGICAL VOLUME:     loglv00                VOLUME GROUP:   usr1vg
LV IDENTIFIER:      00015051a5299fdf.1     PERMISSION:     read/write
VG STATE:           active/complete        LV STATE:       opened/syncd
TYPE:               jfslog                 WRITE VERIFY:   off
MAX LPs:            512                    PP SIZE:        16 megabyte(s)
COPIES:             1                      SCHED POLICY:   parallel
LPs:                16                     PPs:            16
STALE PPs:          0                      BB POLICY:      relocatable
INTER-POLICY:       minimum                RELOCATABLE:    yes
INTRA-POLICY:       middle                 UPPER BOUND:    32
MOUNT POINT:        N/A                    LABEL:          None
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/> lslv -l loglv00
loglv00:N/A
PV                COPIES        IN BAND       DISTRIBUTION
hdisk1            016:000:000   0%            016:000:000:000:000
ibm1:/>

Open in new window

I'm not really sure what's going on here.

Anyway, since we're going to specify the target disks individually we can kind of "mimic" superstrictness without explicitly requesting it.

So let's try this:

mklvcopy loglv00 2 hdisk10

ASKER

that returned without error

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Thanks!! No need to increase size later! :) We are half moved off and should be getting the other half of our stuff moved off by January.... can't wait to be off of this ancient thing!!

BTW -- are there any checks I can do before the reboot to make sure it will come back up OK? I am paranoid about reboots since We had a big problem a long time ago when it got stuck during boot and IBM support couldn't even help us... (this was back when they still supported 5.1 too)) We ended up having to restore from a tape.

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

THANKS! this all seems to look good.

Should I run the same checks for rootvg?

ibm1:/> mklvcopy usr1 2 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11
ibm1:/> syncvg -v usr1vg
ibm1:/> chvg -Q n usr1vg
ibm1:/> savebase
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/> lsvg usr1vg
VOLUME GROUP:   usr1vg                   VG IDENTIFIER:  00015051a5299fdf
VG STATE:       active                   PP SIZE:        16 megabyte(s)
VG PERMISSION:  read/write               TOTAL PPs:      8672 (138752 megabytes)
MAX LVs:        256                      FREE PPs:       754 (12064 megabytes)
LVs:            2                        USED PPs:       7918 (126688 megabytes)
OPEN LVs:       2                        QUORUM:         1
TOTAL PVs:      10                       VG DESCRIPTORS: 10
STALE PVs:      0                        STALE PPs:      0
ACTIVE PVs:     10                       AUTO ON:        yes
MAX PPs per PV: 2032                     MAX PVs:        16
LTG size:       128 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:      no
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/> lsvg -p usr1vg
usr1vg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk2            active            542         0           00..00..00..00..00
hdisk8            active            542         377         109..00..51..108..109
hdisk1            active            542         257         93..00..07..108..49
hdisk9            active            1084        0           00..00..00..00..00
hdisk3            active            1084        0           00..00..00..00..00
hdisk7            active            542         0           00..00..00..00..00
hdisk4            active            1084        0           00..00..00..00..00
hdisk5            active            1084        120         00..120..00..00..00
hdisk10           active            1084        0           00..00..00..00..00
hdisk11           active            1084        0           00..00..00..00..00
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/> lsvg -l usr1vg
usr1vg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
loglv00             jfslog     16    32    2    open/syncd    N/A
usr1                jfs        3943  7886  10   open/syncd    /usr1
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/>
ibm1:/> synclvodm usr1vg
ibm1:/>
ibm1:/>
ibm1:/> varyonvg usr1vg
ibm1:/>

Open in new window

Perfect, congrats!

Run these checks for rootvg if you wish. I'm rather sure that you'll find no trouble there, because rootvg consists of just 2 disks cleanly mirrored, as it seems.

But please have a look at errpt!

If you're indeed planning to reboot, recreate the boot records beforehand and set up the bootlist - just to be sure.

bosboot -ad hdisk0
bosboot -ad hdisk6

bootlist -m normal hdisk0 hdisk6
savebase

ASKER

only wierd thing on rootvg...

ibm1:/> varyonvg rootvg
PV Status:      hdisk6  00011784d15410dc        PVACTIVE
                hdisk0  00015051814ca2c5        PVACTIVE
0516-1437 varyonvg: Varyonvg should not be used to force open or relock the drives of the volume group containing a dump device.

Open in new window

errpt looks good.

ibm1:/> bosboot -ad hdisk0

bosboot: Boot image is 13884 512 byte blocks.
ibm1:/> bosboot -ad hdisk6

bosboot: Boot image is 13884 512 byte blocks.
ibm1:/> bootlist -m normal hdisk0 hdisk6
ibm1:/> savebase

Open in new window

So the rebooting is to make sure quarum checking is on? How important is that? should I reboot within a week? a month? a couple months?

The rootvg message is irrelevant. AIX 5.3 and later don't issue it anymore. You can ignore it.

As for the quorum:

Every disk of a VG contains at least one VGDA (Volume Group Descriptor Area).
A 1-disk VG has 2 VGDAs, a 2-disk VG has three (2 on first disk, 1 on second),
VGs with 3 disks and up have one VGDA per hdisk.

Quorum checking means that more than 50% of the VGDAs must be available to keep the VG running, with 50% or less available VGDAs the VG will be forcibly varied off.

Now to your case: You have 10 disks in your VG, 5 containing original partitions, the other 5 containing the copies.
With quorum checking on the loss of 5 disks will make the VG go down, despite of the fact that all data might still be available if only "copy" disks or only "original" disks are lost.
This can e.g. happen when a SCSI adapter fails.

That's why we usually turn off quorum checking. Without this checking 1 VGDA is sufficient to keep the VG running.

I can't estimate the probability of a SCSI adapter failure in your machine.

It's up to you to decide how important this machine is and how reliable your hardware might be. But in any case, there is no reason to precipitate.

How about avoiding the reboot by following the instructions I gave you in comment #38252955?

ASKER

I don't know how reliable the hardware is.... (In the last three years we have just had one tape drive and one disk drive fail... but that is no indication of the future I suppose)

I can say that even if we were completely without any mirroring -- we would almost definitely still want to be able to run production while we waited for replacement parts.

Assuming that the only risk is that we lose the data that is not backed up if a drive failes. (i.e. there is not risk of making the whole system inoperable in a way that once hardware was all back up and running, we could not just restore from tape and go)

it sounds like, if I understand you correctly, we might not want quorum checking on.... Does that sound right??

p.s. I plan to check errpt weekly for warnings

You definitely don't want quorum checking on, you understood correctly.

All this has nothing to do with the ability to restore a failed volume group from tape.

You already turned off quorum checking in the ODM, so even if the VG goes down you will be able to bring it up again, the new setting being in effect then.

But please, why don't you just take down your application for a minute, umount /usr1, vary the VG off and on, mount /usr1 and start the application again?

Left aside the time your application takes to stop and start, this is a matter of less than one minute.

ASKER

I am just crazy paranoid... Our entire business would be unbelievably screwed if anything happened that prevented this machine from running production - even for just a couple days.

You are absolutely awesome!! But you are basically our ONLY software support for this thing... Which despite your awesomeness is a bit scary :)

If there is a reason that it is important to do this -- then I guess we will do it...
But it sounded as if the reason for doing it was turning quorum checking on – which we don't even want anyway... is that right? Is there a need for doing the varryoff/on?

No, just the other way - the reason for doing it is turning quorum checking off, and that's what you want to do in order to take precautions against the loss of a whole SCSI adapter (not against the loss of a disk or two behind a single adapter - your system will survive that even in its current state).
varyoff/on is needed to make the setting effective which you already configured in the ODM by means of "chvg -Q n usr1vg".

varyoff/on will make LVM read the new setting from ODM and apply it to the VG.

ASKER

I see. I have it all backwards.

So I guess I will try to Vary off/on soon....

If I run into any problems would you by chance be around Sunday around 11 PM?

p.s. just curious what does ODM stand for?

Object Data Management. It's the internal configuration database on AIX, similar to the Windows Registry.

Well, I don't know what timezone you're in.

I'm in Europe here, so talking CUT/UTC I will be available until around 11 PM on Sunday, but not much longer. I'll be back around 8 AM CUT/UTC on Monday.

ASKER

I am eastern US time. I guess I will try tonight and post here if there are any problems... Hopefully you can reply early :)

ASKER

I tried to shut everything down but I got

ibm1:/> umount /usr1
umount: 0506-349 Cannot unmount /dev/usr1: The requested resource is busy.

Open in new window

So I rebooted -- things seem to be fine. Any way I can check to make sure quorum checking is turned off and all is good "under the hood"?

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Sweet! I see "QUORUM: 1"

Thanks a million!!