Link to home
Start Free TrialLog in
Avatar of Erik Whiteway
Erik WhitewayFlag for Canada

asked on

clone - CentOS 6.4 failing - grub error

I have a CentOS 6.4 machine - it's dedicated hardware that runs a printing press.  I need to replace the harddrive (failing) but I can't get the drive to clone. (trying from a different source machine)

When I try clonezilla - I get an error that grub can't fix the boot, and it just hangs on boot at a blank screen.
I tried backup up with acronis, and restoring.  I also get a grub error.  It boots to a grub command line, but the setup (hd0) command just hangs on 2nd detection.

Looking for the easy way to fix the booting and make a stable clone.   I am not a Linux guy, so I'll need pretty exact commands.
Avatar of Dr. Klahn
Dr. Klahn

Are you trying to clone to an identical drive, or something closely related but larger?  If so, that can be done with dd.

dd if=/dev/sda of=/dev/sdb bs=32M

Open in new window


where sda is the system drive, sdb is the target drive.

Limitations:

a) If the system drive is booted, anything written to during the copy will be corrupt.  Quiesce the system as much as possible.
b) The target drive must be error-free.
c) If the target drive is larger than the source drive, the extra space will be unavailable.
d) If the source drive has bad sectors producing read errors that succeed on retry, the target drive may be unusable.
If the source disk already does produce errors, try ddrescue, that will copy all non-problematic areas as fast as possible (large chunks). and ultimate skip failed IO's.
It wil first skip all problematic area's, later revisiting them in smaller parts until the single block reads. Ultimately you get a copy with only the bad blocks missing.
F.y.i. bootup the system in recovery mode or better yet, single user mode. Then no services get started.
Best would be to boot from some rescue USB stick or CD-ROM/DVD-ROM and run dd / ddrescue from there.
Try to prevent any normal use (writing) to the problematic system disk.
Avatar of Erik Whiteway

ASKER

I can copy the data no problem, its the boot that is the issue. How do I make a rescue USB and run the DD rescue?
Recue CD: https://www.system-rescue-cd.org/   it also has instruction on how to put them on usb stick.
ddrescue has it's manual page: man ddrescue   (  https://linux.die.net/man/1/ddrescue )

You should copy the COMPLETE disk. not individual partitions. Bootloaders are stored outside of filesystems. (track 0; block 1-14). block 0 = partition table.
Because 420 or so bytes is not sufficient for multiple technologies filesystems loaders, (lvm, fat, ext2/3/4, etc.) + encryption + menu + loaders etc.

Another approach can be to boot from a CD of the OS that is installed on the systemdisk and reinstall grub in the right places (grub-instal).

can the system be taken offline for a short period of time?
Are you using another HDD or replacing with an SSD? this will speed up the cloning of disk process.

can you post the output of the following in eleveted mode
(sudo -s)

gparted -l to see the drives and partitions
df -k
mount
pvdisplay
vgdisplay
lvdisplay

what is the error with the new drive on boot??
Acronis show this for the drive mapping.
vg_system1-lv_home → vg_system1-lv_home
vg_system1-lv_root (_CentOS-6.4-x86_) → _CentOS-6.4-x86_ (vg_system1-lv_root)
vg_system1-lv_swap → vg_system1-lv_swap
sda1 → sda1

I tried the rescue disk, but don't see anything on there that can help, I'm trying a DL of centOS 6.10 and I'll try it's rescue disk.
still can't get anything past the grub error.
I can book from the rescue disk, but can't get it to fix the problem.

When I boot I get the following screen (hangs at this point)User generated image
Can you take the system offline?
Are you planing on using a new SSD versus the existing HDD?
You effectively boot the existing system, using a liveCD from which you initiate a vibe of sda to sdb using dd which is a media level copy.

Knowing what options are available to you,
You could attach anew drive/SSD as sdb.
You could clone a partition by partition
Cd destination
Dump Source |restore
With /var as the last partition.

Knowing how your existing system is setup, partitioned..

Not sure I understand your acronis output.
Context for the display.

Are both drives attached to the same system?
I don't have the source machine with me (it's on site).

I do have a Clonezilla image of the source machine (on usb HD), also I have an acronis cloud backup.

I have both a spinning disk and SSD that I am trying to restore too.

When I restore with clonezilla - the machine just boots to a black screen.
When I restore with acronis - I get that grub screen.

Here are the images of the clonezilla restore:

User generated image
User generated imageUser generated image

In acronis restore I can see the follow partitions:
MBR - Disk1
Basic - SDA1 (pri/act) 500MB EXT4
Dynamic - vg_system1-lv_home  877GB EXT4
               - vg-system1-lv_root(_CentOS-6.4-x86_) 60GB EXT4
               - vg-system1-lv_swap 4GB  LINUX SWAP

ASKER CERTIFIED SOLUTION
Avatar of noci
noci

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
the first mount command came back with "you must specify the filesystem type"
Found an ubuntu install (16) that has a GRUB repair tool in the boot disk.
It fixed the boot - but now the machine seems to be hung on a flash screen.  I have mouse movement, and a power button. but don't see any login option etc...
ok - using the repair disk, I can boot the system to CLI.
then this works:
  • #/sbin/grub
  •        grub>    
  •        grub>root (hd0,0)        
          grub> setup (hd0) 

The system then seems to boot normally, but I only get a blank-ish desktop.  there is a mouse I can move - and a power button (reboot/shutdown) that does not work.

Some times I get a log in (defaulting to user) - when I select user or 'other' the screen just does nothing (splash like screen stays up) - no password prompt.

Is there some user file or something that I need to boot to?


mount should be able to auto select. Otherwise if you know use -t ext3  or -t ext4 or -t xfs what ever it used to be.
Repairing from within the original filesystem will ensure you have the right grub tools and setup of that environment.
It won't boot so you boot from other media and setup the original system as chroot. (that is what the earlier commands do, including the correct mounts for /sys & /dev).
(If you mount a ext3 as ext4 it will upgrade to ext3 which might not what you want in the end system.)
"flash" screen? ...
Does that repaired grub (your install) know is should look on sda1 (hd0,part1)  to look for the grub directory?

The original system was CentOS so why not use a CentOS ISO in repair mode?... that at least known more about the setup than Ubuntu does. (Yep those systems ARE different, esp in the setup of boot etc.).

I couldn't get any of the centos files to boot to a repair tool.  It seemed to just go straight to wanting to do a full install.

I'll try again now that I know a lot more about the boot process and see if I can get any centos disks to help with the booting.
It shows a menu first where you can choose F1, F2, F3... one of those menu options should enter repair/recovery mode.

ok - major progress:
1. Took a new Filezilla image from a loaner machine (from vendor) - the centos is version 6.10
2. Restore seems to work better
3. figured out that when booting recovery from USB, it becomes sda and the HD is sdb (important when reinstalling grub)

Current status:
1. System boots, to gui, but I can't log in - just loops when I pick the user or 'other user'
2. If I force boot to CLI, login then do startx - GUI works fine.

Now looking at articles - about files to delete / change permissions to fix it - if you have any ideas...

/var/log/Xorg.*.log   is  the log of the X server, grep it on EE / WW for errors and warnings.
* = 0 for first display, 1, for seconds etc.

If you created the backup the right way (that included permissions) if you "just copied" the data then you are in for some work....
rpm might be your friend: rpm -V   see http://ftp.rpm.org/max-rpm/ch-rpm-verify.html   for more info
I checked the log - (just with nano)   - didn't see anything that made sense, but I'll try to grep it.

I did create a new user - and that user can't log in either, so it's system wide not user.

The clone was clonezilla, so permissions should be correct - I checked /tmp and .xauth and both were right (by articles I read).
When booted into GUI, try to press CTRL+ALT+F1 then you can get to console 1. (the graphics console mostly runs on console 7....)
CTRL-ALT-F7 returns you the the graphics console in that case. if Console 7 isn't the graphics console try CTRL-ALT-F6, CTRL-ALT-F8...

Verifying with rpm -V can help anyway.
 
ran 'rpm -V -a'    looks like it did a few things - but problem is still there.

going to grep the log now.
only thing that looks like a real error:
Option "xkb_variant" requires a string value (in the log 5 times)
xkb refers to a keybpard layout, if that is chosen wrong then keys type might not produce the intended character.
Can you verify if that was correctly setup on the original system? (check it's log)?.
When I go in via booting to CLI / login / startx - I do see a warning that ConsoleKit had issues.
I think this has changed so much, I'll start a new question to keep things cleaner.