Our community of experts have been thoroughly vetted for their expertise and industry experience.
Published:
Updated:
Browse All Articles > Replication slave lag monitoring using heartbeat and windows batch scripts.
"Show Slave Status" command has a last column "Seconds_Behind_Master", which gives us idea about how much time slave is lagging behind master. It is an important to be considered parameter in monitoring and maintaining replication.
This article explains us a way to monitor replication slave lag time. It also includes a sample batch scripts to automate the monitoring process, makes it easy to understand.
Whats wrong with "Seconds_Behind_Master":
Show Slave Status command; does shows us Seconds_Behind_Master.
Now Documentation says: The field measures the time difference in seconds between the slave SQL thread and the slave I/O thread. If the network is slow, this is not a good approximation; the slave SQL thread may quite often be caught up with the slow-reading slave I/O thread, so Seconds_Behind_Master often shows a value of 0, even if the I/O thread is late compared to the master. In other words, this column is useful only for fast networks.
Now this has been discussed at a lot of places for a lot of times, the solution is to periodically insert a row into a "heartbeat" table on the master server. Let it get replicated on slave and check the "heartbeat"s on the slave. This will surely explain you replication time and the slave behind master.
So whats the idea?
Consider descriptions of following two mysql functions:
current_timestamp() : The function when used in a query replicates its output same across the slaves. Thus even if the query is executed at a later time on slave, the value stored would be same as what is there in the master.
sysdate() : The behavior of this function is different from the above. It stores the current output of the function at the slave when executed. Thus the output produced could be different from what was produced at the master.
Considering behavior, if we have both values inserted on master server; we will get it replicated on slave. But for sysdate() function, the time will be of slave's and that will help us calculating the slave lag comparing with current_timestamp() value.
Here for making an example I've considered MySQL Server with port 3307 as Master and Server with port 3306 (Default) as Slave for monitoring Slave Lag.
Using sample batch scripts we will be able to understand the replication lag.
ETD, the Execution Time Difference from master to slave shows us how long slave takes to replicate once master has executed the statement. By query we can understand it is difference between current_timestamp() and sysdate().
SBM, Slave Behind Master, points us the time in seconds last query executed on slave from master.
The monitoring script will also tell you the IO and SQL thread status after each minute.
Comments (0)