Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


Bandwidth needs for SAN to SAN (Site to Site) replication

Posted on 2011-09-03
Medium Priority
Last Modified: 2012-06-27
I'm looking at replicating 2 SANs across the WAN, site to site.  I would want the replication to be as real time as possible < 15 min difference. From my calculations, my current SAN is getting an average of 1.25 megabytes per second written to it.  So what I need to know is the type of connection and speed I would be looking at to achieve this. There would be nothing else on this link, just the SAN replication traffic.  If I figured full speed achievement of data connections I think I would be looking at at least 10 MBit (8mbit connection = approx 1mbyte per second), but I'm not sure that would be enough to calculate for the overhead.
Question by:mikeewalton
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
LVL 37

Expert Comment

ID: 36479309
what method of replication are you planning on using ?

Is there any overhead from it ? or compression ?

what type of data is on the SAN ?  would replication be better done at the application layer e.g. Exchange 2010 DAG instead of doing block level SAN replication ?

Author Comment

ID: 36479384
2 Dell Equllogic SAN's, for the sake of this article I want to think of no compression.
LVL 37

Expert Comment

ID: 36479852
if you have an average of 1.25mB/s then what does it spike to ?  How long would it take to "recover" from the spike ?

How much data are you replicating ?

If you already have both on-site, then I would setup replication via a managed switch and monitor bandwidth usage, ideally with sflow/netflow, but SNMP would do at a pinch, or setup a monitor/span port and use ntop

You might want to look at the cost of a 1000mb line against the cost of a Riverbed or similar to do block based "compression".

If the link is in the same city, then it's probably a lower cost option to go for the bandwidth, if its between cities, then a Riverbed or similar might be a better option.
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

LVL 25

Assisted Solution

madunix earned 400 total points
ID: 36480351
Basically replication (LUN copying) has two different varieties: the synchronous and asynchronous. Synchronous replication is cross reference to a RAID1, but over a larger distance. IBM got Metro Mirror (synchronous) and Global Mirror (asynchronous). The main challenge in SAN-SAN replication is to have enough bandwidth such as dark fiber for replication between the Data Centers.

I am using IBM technology to replicate Data in various projects, in SAN replication you have 2 methods of replication (sync and async), in case IBM Metro Mirror (sync replica)  is generally considered a campus-level solution, where the systems are located in fairly close proximity, such as within the same city. However, the distance supported will vary based on the write intensity of the application and the network being used. In general, with adequate resources, most customers find up to a 50-kilometer distance acceptable with some customers implementing up to a 300-kilometer distance. With Global Mirror (async replica), the target site may trail the production site by a few seconds. The frequency of creating a consistency group is a tunable parameter, and you?ll need to balance your recovery point objective with the performance impact of creating a consistency group. Many customers find a three- to five-second consistency group achievable (i.e., in a disaster, you?d lose the last three to five seconds of data)."

I made a SAN replication between 2xSites using 2Mps using Global mirror (async replica)
simple calculation to transfer traffic between A and B using 2Mbps E1:
diff = E1bandwidth - A2Btraffic   (in Megabit/Second)
 totalLSizeInMegaBit=size of Data
It will take approximately (totalLunSizeInMegaBit/diff)
Example with 550GB :
in this case calculated A2B=0 and the E1 full available for data mirror
(totalSizeInMegaBit/diff)=(4505600/2)=2252800 Seconds=26 Days
It means it will take more than 26 days to mirror the data  

So the bandwidth analysis is very important,  by analysing the write load of the disks at the primary site. For this purpose performance data must be collected for all Volumes which will participate in the Metro/Global Mirror. The data can be collected either with (Total Productivity Center) TPC-Disk or with any kind of suitable performance monitor. Alternatively the data can also collected on the disk subsystem, which requires a server which receives and collected the data from the box.

In order to get the correct bandwidth analysis, it is important that during the period of data collection a
realistic write load profile is captured. Especially for the Global Mirror between the intermediate site
and the remote side, it is important to understand the distribution and the relation between write peaks
and the average write rate. For this reason the period of data collection should comprise at least 24h
and if possible a period of high write activities.

Please note, each Metro Mirror (sync) and Global Copy (async) as well, requires the Initial Copy phase, before the regular replication hat start. There is no other way to setup the bitmaps on each site of the relations in a reliable way, so there is no work-around to this process. This means that the time it takes to copy all tracks to the remote site, must be considered in the time planning.


LVL 37

Expert Comment

ID: 36480535
I would usually suggest that the initial sync (especially on small arrays) is done locally.

By small I mean low tens of TB.

Author Comment

ID: 36480804
  The way I figured the average data usage, was to take the write data on the SAN listed by each iscsi connection from the Xen server, and divide the total data used by the number of hours connected, and then add those together to get the total per hour write data. (See Attached). I do see an issue with this as it doesnt give me the peak.

I know the built in Equallogic SAN replication does it's own compression, but not sure what it actually amounts to.  

Looking at the Riverbed device I could definitely see an added benefit of adding it in the mix.

The end result here would be to replicate my Xen Storage pools in a colo, so that I could fail over with out losing too much data. It wouldn't necessarily have to be real time fail over, but I would at least like to be able to fail over to the remote site in the event of a disaster in under an hour, 30 minutes would be ideal.

The initial amount of data will be approx 4 TB
LVL 37

Expert Comment

ID: 36481096
What will you have running in the guests ?

Although SAN replication can be good, usually this would be sync rather than async which you are planning.

You also need to have a process of failing back, and preventing  the main site guest from starting up or replicating while the remote site guest is running.

I would tend to look at moving towards HA rather than DR, this then gives you the capability of switching over to the remote site for maintenance etc with minimal downtime. For you to do a clean move using snapshot based replication, you would need to shut down the guest, then wait for replication to complete and then start up the guest on the remote, so your 15 minute replication time could be considerably longer.


Author Comment

ID: 36481118
Inside the guest is the usual, a couple of DC's (with failover DNS, 2ndary DHCP scopes, etc), Exchange 2007, Sql 2008,  a file server, ts, SharePoint Moss, and a couple of application server, that host some applications that have the sql server as their back end. All are Server 2008 (most are Enterprise).

I would be open to just looking at a secondary HA site rather than failover, in fact if I can get the log shipping, etc, down for exchange and sql it would be preferred for those things like you state as maintenance, etc.

That being said I would still need to figure out the bandwidth type and requirements to keep up with that.  


Author Comment

ID: 36481128
I can spec the servers (ex, SQL, FS, etc) out if needed. i.e. size, db's, etc.
LVL 37

Accepted Solution

ArneLovius earned 1600 total points
ID: 36481392
With DCs I would just have additional DCs

For a Fileserver, I'd use DFSr for replication, possibly with DFS namespace as well.

A 10mb line is cutting it a bit fine for an Exchange 2007 geo split CCR cluster, it would depend on your rate of mail flow and created logs, you would also want a cas/hub server at each site, which if you still use public folders, could also be a public folder server. Exchange CCR clusters need at least one cluster VLAN going across your connection, but this shouldn't be an issue as long as it is Ethernet point to point, if you were planning MPLS it could be more involved.

SQL replication is good, but I've had issues with some applications in the past, Neverfail might be a better solution.

For sharepoint, you just need to setup a "farm"

If you already already have the servers and storage, I would build it all locally on gigabit connections, then reduce down "inter-site link" to 100mb and see if it still functions, and then down to 10mb. If you have issues at 10mb, then I would speak to Riverbed, you can usually get a trial pair from them for at least a fortnight


Featured Post

Enroll in October's Free Course of the Month

Do you work with and analyze data? Enroll in October's Course of the Month for 7+ hours of SQL training, allowing you to quickly and efficiently store or retrieve data. It's free for Premium Members, Team Accounts, and Qualified Experts!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Quality of Service (QoS) options are nearly endless when it comes to networks today. This article is merely one example of how it can be handled in a hub-n-spoke design using a 3-tier configuration.
If you’re involved with your company’s wide area network (WAN), you’ve probably heard about SD-WANs. They’re the “boy wonder” of networking, ostensibly allowing companies to replace expensive MPLS lines with low-cost Internet access. But, are they …
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…
Suggested Courses

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question