LVM problem

Tue Dec 18 09:36:39 EST 2007

Ben Scott wrote:
> On Dec 14, 2007 5:51 PM, Dan Coutu <coutu at snowy-owl.com> wrote:
>   
>>>   The second message, about the VG being successfully extended,
>>> indicates the PV was successfully added to the VG.  Then you blew it
>>> away with the mkfs.  ;-)
>>>       
>> Exactly. I had never seen any message like that when doing LVM things
>> and it threw me off. I figured it was an error of some sort and that I
>> had to take some other action. Feh.
>>     
>
>   Well, it was an error of some sort.  LVM has no way of knowing what
> you have on your devices, and maybe you *did* expect that device to
> have something LVMish on it, so it's warning you that it couldn't read
> it.
>
>   If you want, there's some place in some config file you can use to
> explicitly declare what devices you want LVM to pay attention to.  But
> then you'll have to remember to update the config file when you add
> new devices.  Pick your poison.  :)
>
>   
>>> You probably need to first do a vgreduce to remove the (wrecked) PV
>>> from the VG.
>>>       
>> Yeah, the Red Hat support person suggested that. Only it doesn't work.
>> The vgreduce that is. It gripes about the unknown uuid.
>>     
>
>   Hmmm.  I wonder if you got as far as extending an LV (i.e., having
> an LV claim space on the troubled PV) before it got wrecked.  That
> would likely mean you've corrupted the LV, which I'm sure will make
> you sad.
>
>   Check the output of "pvdisplay -v", "vgdisplay -v", and "lvdisplay
> -v" to see detailed information about all known PVs, VGs, and LVs,
> respectively.  See if you can figure out what LVM thinks is going on.
> In particular, you should be interested in what PVs make up your
> LV(s), and where the wrecked PV is being referenced (if anywhere).  If
> that doesn't shed any immediate light on things, post the output of
> those commands so we can take a look.
>
> -- Be
Well, I have my response from Red Hat an it is a lot like what I'd 
expect to hear from the Redmond gang. Make sure you have full backups of 
everything, reinstall Linux, and restore your files from backups.

Sigh. I've migrated this system from SCO Unix, to Red Hat Linux 8, to 
RHL 9, to RHEL 4 and the process is NOT simple. It takes a month or so 
of planning each time because the system is large, has unusual software 
installed on it, and has a large number of custom commands. So being 
able to rebuild the LVM metadata woould take less time, I believe.

Note: when I originally did the vgextend command besides giving me the 
misleading/confusing message about the cdrom it did say that the extend 
was successful.

I then made a mistake in the confusion caused by that when I ran mkfs on 
the parition that I had just done a vgextend against.

So at this point I've done a pvremove to get rid of the bogus stuff on 
/dev/sdb

The primary physical storage device is /dev/sda2 with /dev/sda1 as /boot 
(which is outside of LVM space.)

/dev/sdb is a second RAID set that is brand new. I want to use the 
entire disk for additional storage.

Here is the output of various display commands:

# pvdisplay -v
    Scanning for physical volume names
    Wiping cache of LVM-capable devices
  Couldn't find device with uuid 'oACqnk-YQTQ-IiGy-F6Pj-UoBB-kUqM-g6Yu3D'.
  Couldn't find device with uuid 'oACqnk-YQTQ-IiGy-F6Pj-UoBB-kUqM-g6Yu3D'.
  Couldn't find device with uuid 'oACqnk-YQTQ-IiGy-F6Pj-UoBB-kUqM-g6Yu3D'.
  Couldn't find all physical volumes for volume group VolGroup00.
  Couldn't find device with uuid 'oACqnk-YQTQ-IiGy-F6Pj-UoBB-kUqM-g6Yu3D'.
  Couldn't find all physical volumes for volume group VolGroup00.
  Can't read VolGroup00: skipping
  Couldn't find device with uuid 'oACqnk-YQTQ-IiGy-F6Pj-UoBB-kUqM-g6Yu3D'.
  Couldn't find all physical volumes for volume group VolGroup00.
  Couldn't find device with uuid 'oACqnk-YQTQ-IiGy-F6Pj-UoBB-kUqM-g6Yu3D'.
  Couldn't find all physical volumes for volume group VolGroup00.
  Can't read VolGroup00: skipping
# vgdisplay -v
    Finding all volume groups
    Finding volume group "VolGroup00"
    Wiping cache of LVM-capable devices
  Couldn't find device with uuid 'oACqnk-YQTQ-IiGy-F6Pj-UoBB-kUqM-g6Yu3D'.
  Couldn't find all physical volumes for volume group VolGroup00.
  Couldn't find device with uuid 'oACqnk-YQTQ-IiGy-F6Pj-UoBB-kUqM-g6Yu3D'.
  Couldn't find all physical volumes for volume group VolGroup00.
  Volume group "VolGroup00" not found
# lvdisplay -v
    Finding all logical volumes
    Wiping cache of LVM-capable devices
  Couldn't find device with uuid 'oACqnk-YQTQ-IiGy-F6Pj-UoBB-kUqM-g6Yu3D'.
  Couldn't find all physical volumes for volume group VolGroup00.
  Couldn't find device with uuid 'oACqnk-YQTQ-IiGy-F6Pj-UoBB-kUqM-g6Yu3D'.
  Couldn't find all physical volumes for volume group VolGroup00.
  Volume group "VolGroup00" not found

Note: Red Hat does have some documentation that talks about recovering 
LVM metadata although not in exactly the same kind of situation. The 
command it mentions is something like this:

pvcreate --uuid "FmGRh3-zhok-iVI8-7qTD-S5BI-MAEN-NYM5Sk" --restorefile 
/etc/lvm/archive/VG_00050.vg /dev/sdh1

Now it occurs to me that there is some chance that this may in fact work 
exactly right by changing the uuid of the /dev/sdb volume to match the 
old uuid that the display commnads are griping about. The only problem 
is that the file that would be used as the value for --restorefile 
contains this:

# Generated by LVM2: Mon Oct 23 13:23:55 2006

contents = "Text Format Volume Group"
version = 1

description = "Created *before* executing 'vgscan --mknodes 
--ignorelockingfailure'"

creation_host = "culverco.culverco.com" # Linux culverco.culverco.com 
2.6.9-42.0.3.ELsmp #1 SMP Mon Sep 25 17:2
8:02 EDT 2006 i686
creation_time = 1161624235      # Mon Oct 23 13:23:55 2006

VolGroup00 {
        id = "AuDV2N-7nfH-7OpL-KjCN-LWVD-ArpI-7AkTBy"
        seqno = 3
        status = ["RESIZEABLE", "READ", "WRITE"]
        extent_size = 65536             # 32 Megabytes
        max_lv = 0
        max_pv = 0

        physical_volumes {

                pv0 {
                        id = "LdCKsY-xEZF-4koe-hnE1-LjVX-eSCi-bz1Asx"
                        device = "/dev/sda2"    # Hint only

                        status = ["ALLOCATABLE"]
                        pe_start = 384
                        pe_count = 4372 # 136.625 Gigabytes
                }
        }

        logical_volumes {

                LogVol00 {
                        id = "0ChzON-UBNj-xEdx-jrir-f5T1-nDKq-Wx4WUP"
                        status = ["READ", "WRITE", "VISIBLE"]
                        segment_count = 1

                        segment1 {
                                start_extent = 0
                                extent_count = 4307     # 134.594 Gigabytes

                                type = "striped"
                                stripe_count = 1  # linear

                                stripes = [
                                        "pv0", 0
                                ]
                        }
                }

                LogVol01 {
                        id = "bI5vdI-uYbl-1ME1-8LvS-VLJ8-SOyn-0tgxVZ"
                        status = ["READ", "WRITE", "VISIBLE"]
                        segment_count = 1

                        segment1 {
                                start_extent = 0
                                extent_count = 62 # 1.9375 Gigabytes

                                type = "striped"
                                stripe_count = 1  # linear

                                stripes = [
                                        "pv0", 4307
                                ]
                        }
                }
        }
}

You will note that nowhere in there is any mention of the problematic 
uuid. Also there is no mention of the physical volume sdb, only of sda2. 
I'm sure that I can manage to edit the file in order to add the proper 
information if only I can figure out what the proper information is. 
Maybe then it would reset the uuid for sdb and I'll be happy again. I hope.

Any ideas, suggestions, comments, etc.?

Thanks,

Dan