Tar/Gzip backup locks up server...accidentally deletes .MYI file

I have a large MySQL master/slave database running on two Solaris 10 x86 servers.

Once a day, I stop the slave server to make a backup. Because of the sheer size of the tables (and the amount of time it takes), I chose NOT to use mysqldump but instead tar/gzip all of the DB directories into a new folder, where I can restore them if needed.

Twice in the last seven months I have run into a bizarre problem: at a point AFTER the slave has stopped and the tar/gzip has begun, the entire server stops responding (SSH, console login) - though it is still running. This then requires a hard reboot.

There are three smaller tables used to temporarily store data. These tables can be empty from time to time. After the reboot, restarting the slave fails because one of these three tables has an .MYI file of 0K. I have to recreate the table from a backup to get the slave server restarted.

/var/adm/messages and /var/log/syslog is empty. So I have very little to troubleshoot with.

Disk space is not an issue.

#!/usr/bin/perl

#disconnect MySQL
system("/etc/init.d/mysql stop");

my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);

#relocate and tar proper files...
my $timestamp= time;
system("cd /var/lib/mysql; tar cEf - ./* mysql-relay-log.info mysql-master.info | gzip > /BK/daily-${timestamp}.tar.gz");

#restart MySQL
system("/etc/init.d/mysql start");
shambrightAsked:
Who is Participating?
 
ahoffmannCommented:
my comment about "kill" wasn't clear enough, sorry
I agree with jlevie that you forst test manually with
   /etc/init.d/mysql stop
check the messages and the status
then try to start your
    tar whatever....

the important thing is if mysql stops properly, if not try with
   kill <--pid-->
if that does not help, try with
   kill -9 <--pid-->

also, I highly recommend not to rely on /var/run/mysql.pid to contain the correct pid,
you better get the real actual pid using ps like for example
   ps ax|awk '/mysqld /{print $1}'
0
 
jlevieCommented:
Since I can't think of a reason that the tar/gzip should lock up the server to the point that a reboot would be necessary (though it could make response really slow), my suspicion is that MySQL may still be running when the tar starts. Try manually running through the steps, but check (with ps) to make sure that all of the MySQL processes have exited before starting the backup.
0
 
ahoffmannCommented:
first ensure that your "mysql stop" call succedds and realy stops the server
second: is /BK a mounted filesystem (not on the same host)?
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

 
shambrightAuthor Commented:
jlevie - this is meant to be automated. I am not a fast enough typist to duplicate a script :)

ahoffman - /BK is a local directory

Added this:

system("/etc/init.d/mysql stop");
my ($exists,$tries);
$tries = 0;
PAUSE_FOR_EXIT:
sleep(3);
$exists = kill 0, `head -1 /var/run/mysql.pid`;
++$tries;
if (($tries > 120)&&($exists)) { exit; } #failed... Nagios will pick up failure
if ($exists) { goto PAUSE_FOR_EXIT; }

Open in new window


If it is that MySQL did not shut down before tar started, then this should solve it.

Any thoughts on why nothing showed up in the logs regarding being locked completely out?
0
 
ahoffmannCommented:
did you try also:
  kill -9 ...
0
 
shambrightAuthor Commented:
I would need access to the server for that :)
0
 
ahoffmannCommented:
somehow, but you already use kill in the php script you posted
0
 
jlevieCommented:
I understand that you need it to be automated. The point of doing it manually (and making sure that all MySql processes have exited) is to figure out if the MySql exit is the problem or if something else is the problem.
0
 
shambrightAuthor Commented:
It hasn't happened again since -- but EE wants this matter closed. Thanks for your input.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.