Replication slave lag monitoring using heartbeat and windows batch scripts.

Published on
13,436 Points
2 Endorsements
Last Modified:
Community Pick
"Show Slave Status" command has a last column "Seconds_Behind_Master", which gives us idea about how much time slave is lagging behind master. It is an important to be considered parameter in monitoring and maintaining replication.

This article explains us a way to monitor replication slave lag time. It also includes a sample batch scripts to automate the monitoring process, makes it easy to understand.

Whats wrong with "Seconds_Behind_Master":

Show Slave Status command; does shows us Seconds_Behind_Master.

Now Documentation says: The field measures the time difference in seconds between the slave SQL thread and the slave I/O thread. If the network is slow, this is not a good approximation; the slave SQL thread may quite often be caught up with the slow-reading slave I/O thread, so Seconds_Behind_Master often shows a value of 0, even if the I/O thread is late compared to the master. In other words, this column is useful only for fast networks.

Now this has been discussed at a lot of places for a lot of times, the solution is to periodically insert a row into a "heartbeat" table on the master server. Let it get replicated on slave and check the "heartbeat"s on the slave. This will surely explain you replication time and the slave behind master.

So whats the idea?

Consider descriptions of following two mysql functions:

current_timestamp() : The function when used in a query replicates its output same across the slaves. Thus even if the query is executed at a later time on slave, the value stored would be same as what is there in the master.

sysdate() : The behavior of this function is different from the above. It stores the current output of the function at the slave when executed. Thus the output produced could be different from what was produced at the master.

Considering behavior, if we have both values inserted on master server; we will get it replicated on slave. But for sysdate() function, the time will be of slave's and that will help us calculating the slave lag comparing with current_timestamp() value.

Implementing Slave Behind Master Monitoring:

Continuing with the article: http://www.experts-exchange.com/articles/Database/Miscellaneous/Quick-Multi-MySQL-Server-Installation-with-Master-Master-Replication-on-Same-Windows-Box.html I've tried to implement replication slave lag monitoring through "heartbeat" method.

Here for making an example I've considered MySQL Server with port 3307 as Master and Server with port 3306 (Default) as Slave for monitoring Slave Lag.

Create table on Master MySQL Server:
create table heartbeat

Open in new window

Periodical value insertion - Generate Heartbeat:
insert into heartbeat(slave_time) values(SYSDATE());

Open in new window

One more case is, when slave is intentionally kept behind master. We can surely add up that time and make according insertion.

Insert Into heartbeat(slave_time) 
Values( from_unixtime((unix_timestamp(sysdate()) + time_to_sec('HH:MM:SS'))) ;

Open in new window

Here, HH:MM:SS is slave's known time of lagging behind master.

We have following queries to monitor slave-lag:

First query will tell us the time difference of slave and master query execution time. Which is actually caused by the functions explained above.

Select master_time, timediff(slave_time, master_time) 
From heartbeat 
Where DATE(master_time) = DATE(NOW()) 
Order By master_time;

Open in new window

This second query will tell us the time difference of last query execution time from replication to now.

select timediff(NOW(), max(master_time)) from heartbeat;

Open in new window

Batch Scripts For Windows:

To understand above setup and making it little automated, I wrote couple of batch scripts.

Following batch script will auto insert one row into heartbeat table on per minute basis on Master.

@echo off
set StartTime=%TIME%
set StartMin=%StartTime:~3,2%
set EndTime=%TIME%
set EndMin=%EndTime:~3,2%
set /a Hour_Diff=StartMin - EndMin >nul
if %Hour_Diff%==0 (
goto start )
echo Beating...
mysql -uroot -pXXXX -P3307 -e "use master;insert into heartbeat(slave_time) values(SYSDATE());"
goto begin

Open in new window

Similar batch created to monitor the lag time on Slave MySQL Server.

@echo off
echo Observing slave delay - per minute basis
echo SBM -  Slave Behind Master / Last Query Executed.
echo ETD - Execution Time Difference
echo ----------------------------------
echo SBM  -  ETD	- Slave Status
echo ----------------------------------
set StartTime=%TIME%
set StartMin=%StartTime:~3,2%
set EndTime=%TIME%
set EndMin=%EndTime:~3,2%

set /a Hour_Diff=StartMin - EndMin >nul
if %Hour_Diff%==0 (
	goto start )
FOR /F "skip=1 tokens=*" %%I IN ('mysql -uroot -pXXXX -e "use master;select TIME_TO_SEC(timediff(NOW(), max(master_time)) ) last_query_time, second(timediff(max(slave_time), max(master_time))) master_slave_difference from heartbeat where DATE(master_time) = DATE(NOW()) order by master_time desc;"') DO set var=%%I%
@echo %var%
FOR /F "tokens=11 delims='|	'" %%G IN ('mysql -uroot -pXXXX -e "show slave status;"') do @echo			%%G 
FOR /F "tokens=12 delims='|	'" %%G IN ('mysql -uroot -pXXXX -e "show slave status;"') do @echo			%%G 

goto begin

Open in new window

PS: Set passwords accordingly.

Using sample batch scripts we will be able to understand the replication lag.
ETD, the Execution Time Difference from master to slave shows us how long slave takes to replicate once master has executed the statement. By query we can understand it is difference between current_timestamp() and sysdate().
SBM, Slave Behind Master, points us the time in seconds last query executed on slave from master.
The monitoring script will also tell you the IO and SQL thread status after each minute.


Featured Post

Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

Join & Write a Comment

In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month