Link to home
Start Free TrialLog in
Avatar of justin-jurgens
justin-jurgensFlag for United States of America

asked on

Hyper-V Replica failing

Hello All,

I am attempting to configure Hyper-V Replica and am coming to a dead end here it seems. Error message appears at the end when attempting to finish the configuration:

Hyper-V failed to enable replication.
Hyper-V failed to establish a connection with the Replica server.
Hyper-V Failed to enable replication for the virtual machines "TestVM": The connection with the server was terminated abnormally (0x00002EFE)
Hyper-V failed to establish a connection with the Replica server 'replica.company.com' on port '443.' Error: The connection with the server was terminated abnormally (0x00002EFE).


No logs showing any connection on the Destination Replica.

Logs on the Source Replica are as follows:
Within Hyper-V-VMMS Admin:
Error - Hyper-V-VMMS - 29310 - Hyper-V failed to establish a connection with the Replica server 'replica.company.com' on port '443.' Error: The connection with the server was terminated abnormally (0x00002EFE).
Error - Hyper-V-VMMS - 32000 - Hyper-V failed to enable replication for virtual machine 'TestVM': The connection with the server was terminated abnormally (0x00002EFE).

Within System Logs:
Error - DistributedCOM - 10028 - DCOM was unable to communicate with the computer replica.company.com using any of the configured protocols; requested by PID      e3c (C:\Windows\system32\mmc.exe).

This servers are on different domains and different sites. There is no trust relationship. So I assumed I needed to use certificate based authentication to make it work. Following the link below here I used self-signed certificates:
http://technet.microsoft.com/en-us/library/jj134153.aspx#BKMK_1_5
Destination server I put two FQDN CN's in case it was expected the external address so it's both replica.company.com and servername.company.com
Source server I put the proper FQDN.

Configured proper ports on our ASA on the destination servers' side for 443 (I also completely opened it on the local Windows firewall and in our ASA for troubleshooting).

Am I missing something here? I've seen a couple forums where people are having the same issue but either no resolution on it or the solution did not resolve it for me.
http://community.spiceworks.com/topic/259848-server-2012-replication-certificate-internal-domain
http://blog.greypuddles.net/?p=179
http://social.technet.microsoft.com/Forums/en-US/winservercore/thread/1979ecbb-3efd-47bc-9322-b509c369a0ed/
http://teety4.rssing.com/browser.php?indx=3899442&item=701
Avatar of ArneLovius
ArneLovius
Flag of United Kingdom of Great Britain and Northern Ireland image

I'm guessing from your description that both hyper-v servers are behind NAT.

You need to use static NAT at both sites to allow the communication to be two way.

You can of course restrict the source address to be only from the remote exit IP address.

I would usually use dynamic policy NAT to keep the outbound traffic on the same exit IP address as the static NAT.

The certifcates need to be configured on both servers.
Avatar of justin-jurgens

ASKER

I configured self-signed certificates on both servers and copied the CA certificate to each others' trust root location.

Both servers are behind NAT yes. When you say static NAT you're meaning a 1-to-1 NAT to an external IP? I have that on the destination side, just not the source side. What is the requirement to having a static NAT entry on the source location's firewall besides failing back the VMs? I'm having an issue just getting it to go to the destination currently.
SOLUTION
Avatar of ArneLovius
ArneLovius
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ok I will complete that tonight. I was unaware of that requirement, didn't see that in anything I read with Microsoft whitepapers nor online forums/articles.

I will update with my progress later.
its not as explicit a it should be

from here http://technet.microsoft.com/en-us/library/jj134240.aspx

If you use network address translation (NAT), ensure that the inbound and outbound ports are configured to use the same port number. Replica only listens on one port.
So I created a static NAT on the source server as well. I also just for giggles completely opened the firewall to both the source and destinations servers. Nothing changed, same error message.

However, as a fluke maybe, I did get one new error on the destination server under Hyper-V-VMMS but it was only one time in the couple hours I was messing around last night:

Error - Hyper-V-VMMS - 29218
Hyper-V received a digital certificate that is not valid from primary server 'source.server.local'. Error: A certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider. (0x800B0109).

It was only one time, so not sure if that will tell us much. I followed the Microsoft instructions for the self-signed certs pretty well and I even deleted the ones I had previously made and went through the instructions again.
Any progress?  I have the same basic problem here, with one twist - no firewall.

Here's what we did for testing.  First, proof of concerpt internally.  Both servers are on the same subnet, firewall is turned off on the machines.  No hardware firewall.

Used self-signed certs with the internal (.local) name for the servers.  No problem, worked fine.  I then built new certs with the external (.com) name for the servers (Again, remember, same subnet at this point - so I just made A records in DNS for now.)  No go - Hyper-V wouldn't take the certs since they didn't have the FQDN for the internal name.

Third time I created the certs, I used two CN names - internal and external.  Took that cert no problem.  Went to enable replication one of our guests and got the same error you have - right when I hit finish.

So I have to think the problem is more cert related versus firewall/routing related.  Unfortunately, I haven't been able to find anything on my end either with lots of Googling, etc.  I'm seeing people that are creating host files, but I have to assume these are more workgroup machines.  Our machines are on the domain, and creating host files to redirect .local addresses scares me a little.  One of those things that will come and bite you three months from now!
I'm heading down the path of UCC/SAN certificates.  Unfortunately, it does not appear we can do this with makecert.  And I'm not finding anywhere I can get a free trial for a UCC cert.  Not real interested in dish out money to find it wasn't the issue though. . .
the certificate has to match the local server name

if you have your own internal CA, you could create an internal SAN certificates
Thanks Arne - I know the cert has to match.  I have no problem getting the cert in, it's a matter of point to the .COM instead of .LOCAL address.

So I spent some time standing up an internal CA for testing.  I created a SAN cert with all of the names I needed -

source.domain.local
source.domain.com
destination.domain.local
destination.domain.com

Put the CA cert in the trusted root, put the SAN cert in the personal store of each Hyper-V host.  Went in fine, was able to turn on replication on each host fine.  Can always get past this step.

Then I went to enable replication on a guest machine; pointed it to destination.domain.com for the replica server, pointed it to the cert, all good there.  Hit finish on the last step, and error.  

What gives?!
ErrorCapture.jpg
I would have two SAN Certs, one for just the source and one for just the destination. Having the source and destination might cause a different problem.
Well, I actually just eliminated the certificate issue all together.  Currently the two (test) machines are on the same domain and same subnet.  We were testing certs to make sure this would work once they're out over the Internet and not on the same subnet.

So for the time being, I tested with kerberos.  Also fails if I use the .com FQDN.  If I use the internal/.LOCAL FQDN - works perfectly fine, just like we experienced with the certs.

I'm at a loss.  Not sure what else to try/test.  We can ping both machines using the .com FQDN (Set this up in a HOST file this time to be sure we weren't having DNS issues.)

I have to assume the machines are rejecting this because they aren't known to be .com and they're seeing it as a security thing maybe?  Is this a case for an alternative UPN suffix maybe?
Possibly you should open a new question for your issues.
If my troubleshooting and methods help the original author, who had the same issues. . .
If you had answers rather than questions...
Netminder,

My apologies.  I felt the troubleshooting steps I was also going through on the same problem might be helpful to others in narrowing down the issue.  Since no one has found a solution yet, figured it wouldn't hurt to give additional ideas out and see if something worked for someone else.
I am still having the same issue unfortunately. Loads of research and attempts but still not getting past the error...
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I will attempt creating a site-to-site VPN and use a SAN certificate with servername and FQDN to see if that does anything.

Will update tomorrow.
Wow...so ya that worked. Site-to-site VPN with the servername & FQDN SAN certificate.

What gives? I may not be able to do this with every client, so what do you think a better solution will be? Is there something I did wrong?
I assume you used the internal FQDN?
Previously I on the destination server certificate I used multiple CNs:
destinationserver
destinationserver.destinationcompany.local
replica.destinationcompany.com

On the source server certificate:
sourceserver
sourceserver.sourcecompany.local
replica.sourcecompany.com

Nothing worked.

As soon as I put a site-to-site VPN and placed manual entries into the corresponding HOSTS files it worked perfectly.
Now you know that it works, you can try to "break" it :-)

I would take down the VPN, and use the hosts file on each server using the public address that is NATted to the other server. I would use the internal shortname and internal f.q.d.n

When created the NAt rules, make sure that you are allowing all outbound traffic, and using dynamic policy NAT to ensure that the outbound traffic comes "from" the same address that the inbound traffic is "to"
This post suggest setting a certificate authentication post, which I read as being separate to the replication traffic port

http://blogs.technet.com/b/virtualization/archive/2012/07/16/hyper-v-replica-certificate-based-authentication-in-windows-server-2012-rc.aspx

I would be tempted to setup a packet capture somewhere along the path between teh two servers (with the traffic going over the VPN) and leave it capturing everything that includes both host addresses, but excludes port 443, and see if any other ports are being used.
Afterward for testing purposes I attempted to disabling the VPN and doing the HOSTS file edits for the internal short name and FQDN. Didn't work. So I know it's definitely not a certificate issue but a network issue somewhere along the lines.

As for that second part, I'm not sure what you mean by "allow all outbound traffic on the NAT entry." Could you explain that? I am using ASA 5505's and 5510's.

I'll setup a packet capture to see anything in regards to your secondary post there. But all I've read is that Replica only listens/replies on a single port. Definitely doesn't seem like that's true though for some reason...
Use a dynamic NAT policy for traffic from the hyper-v server so that the egress IP is the same as the ingress IP, rather than just using the interface address.

If you are just using PAT for port 443, I might also try a full NAT for the address, using an ACL to restrict traffic to just the remote side.
Ya I just have static NAT entries (1 to 1) for the Replica servers with an ACL to allow TCP 443.

I'll attempt that here today. Thanks for sticking with me here...
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry for the lack of updates, been incredibly busy. So some success and some failure:

Editing the ACL to say "From this address -> Pass all IP" without the VPN activated works great. But only with servername.company.com and not replica.company.com which honestly doesn't matter.

I can say this is resolved. Thank you sir!! If I could give you more points I would!!
I presume that servername.company.com is the internal f.q.d.n, not the public address ?

If you want to lock it down more, you could run a packet sniffer to grab traffic between the two hosts expluding port 443, and create a more restrictive ACL

I would be tempted to do this anyway, just to make sure that noting is being sent in plaintext between the two boxes.
Yes, servername.company.com is the internal FQDN. Replica.company.com was what I was going to use externally. Even using a SAN certificate with both names, the ACL you recommended and the external FQDN it would fail with the original message.

But as soon as I changed it to the internal FQDN with the same process it worked fine.

I'll probably lock it down, but I'm just happy I at least know how to get it to work and slowly lock it down from there.
Error - Hyper-V-VMMS - 29218
Hyper-V received a digital certificate that is not valid from primary server 'source.server.local'. Error: A certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider. (0x800B0109).

Guys I just had the same problem. Banged my head against a wall for about an hour making new self signed certs over and over and always getting the same error.

I was using the command line based "certutil -addstore -f Root FirstRootCA.cer" command (which I always use and it always works fine) but in desperation, I used the GUI on the primary server to export the Cert to a PFX and then imported it, manually on the other end. The issue's now solved.

Weird but I wanted to mention it in case it helps someone else.