Adventures in Exchange 2007 Clustering

Delphineous SilverwingGood Ol' Geek
CERTIFIED EXPERT
Published:
Nearly six years ago I was hired by a company to be their senior server engineer. One of my first projects was to implement Exchange Server 2007 on a Windows Server 2008 Single Copy Cluster for high availability. That was the easy part; read on to learn how life progressed through the years.

Hardware


The company purchased the server hardware before I was hired late 2009. Fortunately, it was reasonable hardware and would work for the time being, though I would have liked to work with slightly better hardware.

Supplied Hardware

Intel Xeon E5504 2.0 GHz

4GB RAM

RAID 1 Disk for O/S and Applications

4 Gb Fiber to SAN

Redundant Power Supply




Initial User Environment
600 Users (400 on-site, 150 remote site, 50 Work from home [VPN])
Average Mailbox Size of 350 MB (Largest 1.8 GB)
No Mailbox Limit
Message size limit of 5 MB
 

The Intel Xeon E5504 2.0 GHz processor is a Quad-core processor that would be sufficient for a smaller environment. Higher core count processors weren't available at the time, therefore two processors would have been a better choice for the size of our environment. Today, this server is struggling to meet the demands of our user base and we frequently receive complaints.

Exchange by nature of the beast will consume every drop of available memory, just like SQL Server, leaving just enough to keep the operating system happy. 4 Gigabytes of Memory was barely enough for Exchange 2007 and the Operating System to behave. As time progressed and user demand became more intense, the memory was upgraded to 20 GB on each node of the cluster.

When it comes to disk arrays, typically the more spindles (disks) in an array means better performance. Using RAID 1+0 allows the use of multiple smaller, faster disks to help improve O/S performance and add resiliency.

Since the servers were being used as a cluster, someone decided that only a single SAN connection was necessary from each node. Using two connections, along with appropriate driver software, would allow redundancy in case of fiber or HBA failure plus the possibility of doubling the bandwidth. Just as any database, Exchange is intensive for disk I/O, dual paths allow double throughput and improves user experience.

Redundant power supplies are essential for critical systems. Under normal circumstance if a power supply fails, the remaining power supply(s) will continue powering the server without any downtime. Hot-pluggable redundant power supplies can even be replaced without shutting down the system.
 

Initial Exchange 2007 Cluster Deployment


Exchange 2007 Server would not install on a Windows 2008 Server; Exchange Server 2007 was designed and released prior to Windows 2008 and therefore wanted Windows Server 2003. Being six years after release, I wasn't willing to deploy on Windows Server 2003.

I had to slipstream service pack one into the installer to install on Windows Server 2008. I had not slipstreamed in an installer before, so that was my first adventure. Thank the geeks of the world for the internet where I found the instructions for slipping SP1 into the Exchange install source.
 

Drive Configuration


Using Microsoft's Technet Guide for Exchange 2007 cluster storage design and mailbox information in the Exchange 2003 server, I was able to do a little math and determine that four private information stores would work well plus one public information store. LUNs were created on our SAN for a "Log" volume and a "Store" volume that each information store would possess.


LUN = Logical Unit Number ("Drive" space on a SAN [Storage Array Network])

Drive and Purpose
A - Reserved by OS
B - Reserved by OS
C - Local drive for OS
D - Local drive for Apps
E - SAN LUN for Private Store 1
F - SAN LUN for Private Store 1 Logs
G - <unused>
H - Reserved for Home Drive mapping
I - SAN LUN for Private Store 2
J - SAN LUN for Private Store 2 Logs
K - SAN LUN for Private Store 3
L - SAN LUN for Private Store 3 Logs
M - SAN LUN for Private Store 4
N - SAN LUN for Private Store 4 Logs
O - <unused>
P - <unused>
Q - SAN LUN for Cluster Quorum
R - <unused>
S - <unused>
T - <unused>
U - <unused>
V - <unused>
W - <unused>
X - <unused>
Y - <unused>
Z - Local Optical Drive

As you see, we used more than half the letters of the alphabet right off the bat.
Note: Windows Cluster will not allow the use of letter A and B for clustered drives.

Using SAN resources for drive space instead of locally attached disk allows sharing disk between systems for SCC (Single Copy Cluster) and also expansion of drives as space requirements change.
 

Mailbox Limits


Our initial four information stores kept growing and it was necessary to add more drives for storing out of control mailboxes, less than one year later. As two new drives were mounted on the server for store number five (and its logs), I pleaded with company management to allow mailbox limits be enabled. Users were not managing their mail and the mailboxes were out of control.

Microsoft recommends that an Exchange Server 2007 Information Store not exceed 100 GB to ensure reasonable backup and recovery ability when not using continuous replication.1

Using Microsoft's guidelines on maximum database size for us as a starting point, I did a thorough analysis with number of mailboxes and current mailbox sizes plus planning for 5% growth and 10% exceptions. I recommended a 500 MB mailbox limit with "Manager Approved" exceptions of 750 MB; anything higher requiring executive approval. This would allow our five private information stores to serve the user base for several years, maintaining high performance and availability.

Culture shock is the biggest barrier when implementing mailbox and/or message size limits. "We've always done it this way" attitudes of persons that simply aren't able to compromise for the good of the organization.
By Spring 2012 I added another private information store and with bringing up the mailbox restriction proposal again I was finally approved to implement a 5 GB mailbox limit. A far cry from the 500 MB I proposed and there is not enough letters in the alphabet to provide enough information stores given the 100 GB database recommendation.

Today, five years after initial deployment, we have ten private information stores with the largest store at 170 GB. There are no letters left in the alphabet. I am considering using Mount points for our future Exchange 2013 environment, but that's a completely different conversation for another day.
In the meantime, I continue to push management to allow a smaller mailbox limit.
 

Redundancy for Disaster Recovery


Single Copy Cluster is as the name suggests - There is only one copy of the data on disks shared between the two systems in the cluster. If the SAN should fail or some other catastrophic event occurs, e-mail functionality needs to resume promptly. Replication is needed to copy the mailbox data somewhere else safely.

We have a secondary data center located about one thousand miles away with a 100 Mb connection to our main data center. Since it is a different subnet and AD site, it was decided to deploy Standby Continuous Replication.

Initial testing showed that the 100 Mb connection to the standby server would be too slow for our growing information stores. The offsite synchronization project was put on the back-burner.
After a couple years delay, the company finally allowed us to upgrade the pipe to a 500 Mb connection. We had much better bandwidth and tolerable lag (26 MS); I was ready to begin setting up the standby server. The Database team had started log shipping over the connection and saturated it at first causing enormous network issues between sites. Being sensitive to the business needs, the Exchange mailbox replication was further delayed until the SQL synch and network stabilized.
Over the next year or so there were significant bumps in performance, mainly caused by our connection provider and careless contractors with digging equipment. Management was still over-sensitive and the standby Exchange server continued to twiddle its thumbs.

Mind you, every year this project was represented on my annual performance review. Typical that something which was out of my control delaying the project would be included.
Mid-2014 our company was purchased by a competitor. With the change in management, over the next several months I was able to convince management to allow me to move forward with the Exchange Dr project.
Getting the final go ahead in Spring 2015, I built the remote data center CAS and HUB servers plus I rebuilt the mailbox server. I setup SCR from the cluster to the standby server; typically notice of having to seed the database before synchronization can occur. Great the initial setup is complete, now I just need to seed the database and we're golden.
Bear in mind that all the while I am trying to accomplish this, the project is still being interrupted by integration work, keeping the lights on, amongst other things.

Seeding the database wasn't as easy as one would expect. When I tried to start the process, I'd get an error saying the target machine is invalid. Ah - I'm supposed to initiate the seeding from the target server; okay. I connect to the target server and initiate the process from there. The seeding couldn't occur because the standby (target) server was not part of the mailbox cluster (source).

I hadn't setup a cross-site cluster before. This would be a good learning opportunity. Exchange was uninstalled from the standby server and the server was added into the cluster as a third node. This was looking great - I had a three node cross-site cluster. Well, until I went to reinstall Exchange Server. Attempting to install Exchange as a Passive node in the cluster fails prerequisite checks. One error that keeps me from proceeding.

"This cluster spans multiple Active Directory sites. Exchange Server 2007 cannot be installed."

It is unfortunate that Clustering is not a common enough Windows technology that there would be sufficient online references that would have saved me all this time. I was giving up and started looking into third party software, when suddenly I remembered one of the things I read. The standby server isn't supposed to be in the production cluster, but instead a standby cluster.


A passive node that is designated as an SCR target must be a member of a failover cluster that does not have any clustered mailbox servers. This is referred to as a standby cluster.2

Knowing that clustered mailbox servers cannot contain the HUB and CAS roles, I spun up a virtual machine for these roles in the secondary site. My next step was to make the SCR target server a single node standby cluster. I have been working with Windows Clusters for more than a decade; this part was easy. I made sure to install Exchange as a "Passive" mailbox role.

I started with my smallest information store - enabled replication to the standby server, then went to the standby server and issued an update command to seed the database. Low and behold, forty minutes later this 25 GB information store was replicated to the standby server. The remaining stores were handled off hours to ensure replication traffic didn't impact production, due to the large size of the stores.
 

Lessons Learned


There are definitely a number of key items learned over the years with this environment.
 

Don't Go Cheap


Performance hardware is expensive, but e-mail is critical to most organizations and downtime costs the company more money than a proper deployment. Hot-swappable RAID drives and redundant power supplies are essential to uptime.

After determining the kind of hardware required for your implementation, enhance it dramatically. You aren't just building the server for today's business, but that of several years from now. Make sure that the hardware being used for your Exchange Server exceeds your current needs and can survive growth of the business and updates to the system software.
 

Research for Proper Architecture


Slow processor and barely enough memory, Exchange Server will keep running, slow but running. When the information store outgrows the storage, Exchange stops and you have a serious problem. Review Microsoft's documentation of the version of Exchange Server being implemented for guidance in storage sizing and configuration.

Exchange Server 2007
Exchange Server 2010
Exchange Server 2013 

Establish and Enforce Controls


Storage management involves accounting for mailbox data, search indexes, log files and other essential data for Exchange Server to survive. Establishing a fair and appropriate mailbox (and message) size limit will help keep the storage under control without crippling the enterprise. Executive buy-in is critical for success.


Have a Good DR Plan


In order to be a good DR plan, it needs to be setup and work. Sometimes you may have to set everything up and test it. Learn from the test and fix any opportunities before a live situation occurs. Search the internet and hopefully someone has a fix, but sometimes you just have to roll up your sleeves and figure it out yourself.


SCR Works on Single Copy Cluster


The key to making SCR work with a Single copy cluster source is to make the target server a single-node, standby cluster. If the standby cluster is in another site, then a HUB and CAS server will need to be created as a prerequisite to support that site.
 

Final Word


We now have just over 800 user mailboxes with the average mailbox being 1.2 GB in size. Total space acrossed all stores is about 1.4 TB.
Although Exchange 2007 Server is a couple versions behind, it is the last version to support SCC and it is nearing end of support life3 with Microsoft, a lot of the information in this post can be extrapolated and used for other Exchange versions and environments. Learn from our suffering to better plan your messaging deployment.
 


Footnotes


  1. Recommendations for Configuring Storage Groups and Databases]
  2. Exchange 2007 Help: Planning for Standby Continuous Replication
  3. Extended support ends April 2017
1
2,271 Views
Delphineous SilverwingGood Ol' Geek
CERTIFIED EXPERT

Comments (2)

Albert WidjajaIT Professional
CERTIFIED EXPERT

Commented:
Many thanks for sharing the story here.

So with the new Exchange Server 2013, are you going to use DAG with lagged copy or just 2 nodes DAG only ?

In my previous company I was using 2 Nodes stretched CCR cluster across two different VLAN/Subnets geographical locations. It wasn't ideal since the AD sites is also different, hecne in the DR I must change the Mailbox Server AD sites membership to match the HUB/CAS server to be able to send email after the failover.

Microsoft Suggest 2 node CCR cluster  within the same AD site and then one more node to become SCR (standby) in the other AD site.
Delphineous SilverwingGood Ol' Geek
CERTIFIED EXPERT

Author

Commented:
I am still evaluating our options with Exchange 2013 being drastically different than Exchange 2007.

The project is being significantly delayed by the existence of legacy messaging systems (like Exchange 2003) and the probability of a Windows Domain name change. [Which isn't exactly supported]

May the adventures continue.  :^)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.