I am designing a SCOM solution for my environment and would like it to be Highly available and Site resilient. I have a primary datacenter and a DR site at a colocation. They sites will be connected with a dedicated 1gbps link. I am not concerned with bandwidth or latency for the most part. This connection will share have hyper-v replica and storage replication traffic on it but I don't think additional management traffic will be an issue. So I have read a number of articles including TechNet and an article by Paul Keeley regarding this topic. Here are the suggestions I considered and ultimately.
A separate SCOM management server and DB in each site with a gateway to the other management server. Then multihoming all of my clients. Unfortunately this seems like a lot of extra administration and the multihoming seems like it would generate some strange issues. Also if I ever failover I will not be able to fail it back without losing data.
A single SCOM MS server with a SQL mirror and witness. The primary SQL server would be in the primary site and secondary SQL server and witness would be in the DR site. I could use Hyper-V replica to replicate the SCOM MS server on a 30 second interval. The issue I see here is it is all a completely manual process to failover and I could presumably lose the monitoring data that could help my figure out what happened in an unplanned outage. I don't really like having one SCOM MS server.
I cannot figure out if this would actually work but I thought about using the same SQL setup as Solution#2 but setting up the SCOM management group with two MS Servers one in each site. Now theoretically this would give me the necessary HA I need. I am just not sure what would happen if the DR site got cut off unplanned.