Monitoring Hyper-V and Hyper-V Replica

63 Points
Last Modified:
Lee W, MVP
A simple method for monitoring your Hyper-V servers and Replication Status

I use Hyper-V replica to replicate VMs between datacenters.  Replication, while it's fairly reliable, can fail due to connectivity issues, server failures or other issues.  To monitor my VMs, I've written a couple of scripts (years ago) that I use Task Scheduler to run automatically and continuously which will notify me of issues.

I'll get to the details shortly, but a few notes about how I set things up:

On each server I manage (and most workstations), I create two folders off the C:\Windows folder: Utils and Scripts.  I place my third party utilities in the C:\Windows\Utils folder and my scripts in the C:\Windows\Scripts folder.  I then adjust the Path environment variable (via the classic System Control Panel, under Advanced Settings, Environment Variables button).  

By placing those folders in the path, I'm able to execute them from any folder via any script. (One tip: put Utils before Scripts - On random occasions, I've accidentally written scripts and named them the same as a utility.  The order of the folders in the path determines which one runs as the first one found will be executed.  If you write a script and name it PSEXEC.CMD and have PSEXEC.EXE in the Utils folder, PSEXEC.EXE will execute first if the Utils folder is specified first and you'll be more likely to catch your error!

Script 1: DailyReplicaStatusUpdate.ps1

This script runs as a scheduled task once per day (for me, at about 5:45 am).  It provides me with a summary that indicates the state of each VM, the VM's health status, what mode it's in, the frequency of replication (in seconds), which servers are Primary and which are Replica for that VM, the port used, the authentication type, and the relationship.

$console = $host.ui.RawUI
$size = $console.WindowSize
$size.Width = 130
$buffer = $console.BufferSize
$buffer.Width = 130
$console.BufferSize = $buffer
$console.WindowSize = $size
$OutputFile=$env:temp + "\~VMRepl-Status.txt"
get-vmreplication | Out-File -FilePath $OutputFile
measure-vmreplication | Out-File -FilePath $OutputFile -Append
gwmi Win32_LogicalDisk -Filter "DriveType=3" | select VolumeName, FileSystem,FreeSpace,BlockSize,Size | % {$_.BlockSize=(($_.FreeSpace)/($_.Size))*100;$_.FreeSpace=($_.FreeSpace/1GB);$_.Size=($_.Size/1GB);$_} | Format-Table VolumeName, @{n='FS';e={$_.FileSystem}},@{n='Available';e={'{0:N2}'-f $_.FreeSpace}}, @{n='% Free';e={'{0:N2}'-f $_.BlockSize}},@{n='Size';e={'{0:N3}' -f $_.Size}} -AutoSize | Out-File -FilePath $OutputFile -Append
Get-PhysicalDisk | Sort Size | FT FriendlyName, Size, MediaType, Manufacturer, Model, HealthStatus, OperationalStatus -AutoSize | Out-File -FilePath $OutputFile -Append
Get-PhysicalDisk | Get-StorageReliabilityCounter | ft deviceid, temperature, wear -AutoSize | Out-File -FilePath $OutputFile -Append
$body=(Get-Content $OutputFile | out-string)
Send-MailMessage -To PersonGetting@TheseMessages.com -From AnAppropriate@SenderOfThisMessage.com -body $body -SmtpServer Your.SMTPServerOrIP.com -subject "Hyper-V Replica and Host Disk Status"

Each morning, I get the following summary to see how things are and if I need to be concerned with anything:

VMName       State       Health  Mode    FrequencySec PrimaryServer ReplicaServer ReplicaPort AuthType Relationship
------       -----       ------  ----    ------------ ------------- ------------- ----------- -------- ------------
SERVER01     Replicating Normal  Replica 300          ALAHV01       PHXCN01       80          Kerberos Simple      
SERVER02     Replicating Normal  Replica 300          ALAHV01       PHXCN01       80          Kerberos Simple      
SERVER03     Replicating Normal  Replica 300          BALHV01       PHXCN01       80          Kerberos Simple      
SERVER04     Replicating Normal  Replica 300          BALHV01       PHXCN01       80          Kerberos Simple      
SERVER05     Replicating Warning Replica 300          CALHV01       PHXCN01       80          Kerberos Simple      
SERVER06     Replicating Normal  Primary 300          PHXCN01       KCMCN01       80          Kerberos Simple      
SERVER07     Replicating Normal  Primary 300          PHXCN01       KCMCN01       80          Kerberos Simple      
SERVER08     Replicating Normal  Primary 900          PHXCN01       KCMCN01       80          Kerberos Simple      
SERVER09     Replicating Normal  Primary 900          PHXCN01       KCMCN01       80          Kerberos Simple      
SERVER10     Replicating Normal  Primary 900          PHXCN01       KCMCN01       80          Kerberos Simple      

VMName       State       Health  LReplTime            PReplSize(M) AvgLatency AvgReplSize(M) Relationship
------       -----       ------  ---------            ------------ ---------- -------------- ------------
SERVER01     Replicating Normal  1/31/2020 5:44:08 AM 0.00         00:00:08   50.47          Simple      
SERVER02     Replicating Normal  1/31/2020 5:43:46 AM 0.00         00:00:00   1.76           Simple      
SERVER03     Replicating Normal  1/31/2020 5:44:09 AM 0.00         00:00:02   18.42          Simple      
SERVER04     Replicating Normal  1/31/2020 5:43:08 AM 0.00         00:00:22   126.71         Simple      
SERVER05     Replicating Warning 1/31/2020 5:45:29 AM 0.00         00:00:03   21.68          Simple      
SERVER06     Replicating Normal  1/31/2020 5:43:38 AM 0.0117       00:00:01   5.42           Simple      
SERVER07     Replicating Normal  1/31/2020 5:45:30 AM 0.0078       00:00:07   36.02          Simple      
SERVER08     Replicating Normal  1/31/2020 5:35:00 AM 296.00       00:00:18   103.15         Simple      
SERVER09     Replicating Normal  1/31/2020 5:43:08 AM 32.00        00:00:16   67.32          Simple      
SERVER10     Replicating Normal  1/31/2020 5:41:45 AM 0.0117       00:00:12   33.01          Simple      

VolumeName      FS   Available % Free Size    
----------      --   --------- ------ ----    
                NTFS 31.51     24.82  126.953  
Backups         NTFS 1,862.71  99.99  1,862.891
OS              NTFS 137.04    21.01  652.334  
Data            NTFS 331.71    97.82  339.092  
Replica Storage NTFS 475.98    27.08  1,757.812
Storage         NTFS 418.44    52.04  804.037  

FriendlyName                 Size MediaType   Manufacturer Model          HealthStatus OperationalStatus
------------                 ---- ---------   ------------ -----          ------------ -----------------
Single Flash Reader   31914983424 Unspecified Single       Flash Reader   Healthy      OK              
HP LOGICAL VOLUME   1000171331584 Unspecified HP           LOGICAL VOLUME Healthy      OK              
Seagate Desktop     2000398934016 Unspecified Seagate      Desktop        Healthy      OK              
HP LOGICAL VOLUME   4000684662784 Unspecified HP           LOGICAL VOLUME Healthy      OK              

deviceid temperature wear
-------- ----------- ----
2                  0    0
0                  0    0
1                  0    0
3                 35    0

Some notes about Script 1:

When I was first coding this, I had some issues with data being cut off.  It was determined that the default screen and buffer widths was the cause.  As a result, I adjusted them by adding the $size and $buffer variables and settings at the beginning.

In the report, you'll notice that only those servers of "Mode" "Primary" display have a "Pending Replication Size" (PReplSize). This is because only the active VMs are considered primary and only the active VMs will have data that is pending replication.

For readability, I recommend configuring your e-mail client to view plain text email messages with a monospace font like Courier New or Lucida Console.

More information on the PowerShell commands used in this script can be found below:

MEASURE-VMREPLICATION - https://docs.microsoft.com/en-us/powershell/module/hyper-v/measure-vmreplication

GET-PHYSICALDISK - https://docs.microsoft.com/en-us/powershell/module/storage/get-physicaldisk

Script 2: ReplChk.ps1

This script runs as a scheduled task every 20 minutes throughout the day.  You can, of course, schedule it more or less frequently.  For me, I find that 20 minutes is a good time frame and will alert me without nagging me too much for brief outages but make me aware - and nag me - for longer ones.

$ReplStatus = measure-vmreplication
$VMStatus = Get-VMReplication
If (($ReplStatus.State -ne "Replicating") -and ($ReplStatus.State -ne "WaitingForInitialReplication") -and ($ReplStatus.State -ne "Suspended"))
    write-host Replication Alert!
    $FullStatus = "      Server: " + $ReplStatus.Name + "`r`n  HostServer: " + $VMStatus.PrimaryServer + "`r`n   Frequency: " + $VMStatus.FrequencySec + "`r`n       State: " + $ReplStatus.State + "`r`n      Health: " + $ReplStatus.Health + "`r`n    LastRepl: " + $ReplStatus.LReplTime + "`r`n  AvgLatency: " + $ReplStatus.AvgLatency + "`r`n  AvgRplSize: " + $ReplStatus.AvgReplSize + "`r`nHealthDetail: " + $ReplStatus.ReplicationHealthDetails
    Write-Host $FullStatus
    Send-mailmessage -from AnAppropriate@SenderOfThisMessage.com -to PersonGetting@TheseMessages.com -subject "REPLICATION ALERT!" -body $FullStatus -smtpServer Your.SMTPServerOrIP.com -Priority High
    write-host Replication OK.

If ($VMStatus.Health -eq "Critical")
    write-host Replication Alert!
    $FullStatus = "      Server: " + $ReplStatus.Name + "`r`n  HostServer: " + $VMStatus.PrimaryServer + "`r`n   Frequency: " + $VMStatus.FrequencySec + "`r`n       State: " + $ReplStatus.State + "`r`n      Health: " + $ReplStatus.Health + "`r`n    LastRepl: " + $ReplStatus.LReplTime + "`r`n  AvgLatency: " + $ReplStatus.AvgLatency + "`r`n  AvgRplSize: " + $ReplStatus.AvgReplSize + "`r`nHealthDetail: " + $ReplStatus.ReplicationHealthDetails
    Write-Host $FullStatus
    Send-mailmessage -from AnAppropriate@SenderOfThisMessage.com -to PersonGetting@TheseMessages.com -subject "REPLICATION ALERT!" -body $FullStatus -smtpServer Your.SMTPServerOrIP.com -Priority High
    write-host Replication OK.

In the event there's a problem, I get emails every 20 minutes that look like this:

  HostServer: ALAHV01.mydomain.com ALAHV01.mydomain.com BALHV01.mydomain.com BALHV01.mydomain.com CALHV01.mydomain.com PHXCN01.mydomain.com PHXCN01.mydomain.com PHXCN01.mydomain.com PHXCN01.mydomain.com PHXCN01.mydomain.com
   Frequency: 300 300 300 300 300 300 300 900 900 900
       State: Replicating Replicating Replicating Replicating Replicating Replicating Error Replicating Replicating Replicating
      Health: Normal Normal Normal Normal Warning Normal Critical Normal Normal Normal
    LastRepl: 01/31/2020 02:49:08 01/31/2020 02:48:46 01/31/2020 02:48:48 01/31/2020 02:47:50 01/31/2020 02:49:38 01/31/2020 02:48:01 01/31/2020 02:45:08 01/31/2020 02:49:58 01/31/2020 02:42:49 01/31/2020 02:41:20
  AvgLatency: 00:00:09 00:00:00 00:00:02 00:00:24 00:00:03 00:00:02 00:00:07 00:00:19 00:00:14 00:00:14
  AvgRplSize: 54359035 1638191 20832600 138634366 21720927 6343976 38732968 112503025 56665225 39710164
HealthDetail: Time duration since the last successful application consistent checkpoint has exceeded the warning limit for the virtual machine 'SERVER05'. (Virtual machine ID CF4FBC76-7381-4F94-A00B-B0893B737F7E) Replication for virtual machine 'SERVER07' is in error. Fix the error(s) and resume replication.

Some notes about Script 2:

The script does NOT notify you of Warning status.  A warning status may be encountered when replication has missed some replication intervals but is still replicating. For me, this was not a big concern - script 1 notes warning status and I review that once per day.  If there's a warning state, I'll see it there.

The emails are not pretty, even when viewed in a monospace font. They do provide key information, however, as to what's wrong and the mere alert tells you you have to look into an issue as something is not right.

Running these scripts:

Wanting to keep things simple, I have created similarly named batch files which are what actually execute the PowerShell scripts via Scheduled Tasks.  


Powershell.exe -executionpolicy remotesigned -File c:\Windows\scripts\DailyReplicaStatusUpdate.ps1


Powershell -file c:\windows\scripts\ReplChk.ps1

More information on the PowerShell command used in this script can be found below:

GET-VMREPLICATION - https://docs.microsoft.com/en-us/powershell/module/hyper-v/Get-VMReplication

A brief disclaimer: I'm not a PowerShell expert so there may be better ways to code these - I invite anyone who has an interest to enhance these and post in the comments below!

Thanks for reading - I hope this helps you stay on top of your replication and host status!

Author:Lee W, MVP
Ask questions about what you read
If you have a question about something within an article, you can receive help directly from the article author. Experts Exchange article authors are available to answer questions and further the discussion.
Get 7 days free