Service provider takeaway: Exchange Server 2007 offers four high-availability options that service providers can use to improve customers' email client uptime.
Email is typically one of the most mission-critical applications for a company. So it makes sense that your customers are striving for high availability within their messaging infrastructure. To get them there, you need to fully comprehend the high-availability options in Microsoft Exchange Server 2007.
Because high-availability needs for messaging infrastructures vary based on a customer's requirements, there are four high-availability options in Exchange 2007 to solve most organizations' needs. Each of the four alternatives -- local continuous replication, single copy clusters, cluster continuous replication and standby continuous replication -- provides a different level of protection for the Exchange server, storage and site. We'll break down the four options to help you determine which makes the most sense for a particular customer.
Local continuous replication (LCR)
New to Exchange Server 2007, LCR is a single-server solution in which Exchange provides improved resiliency to disk failures. By using built-in asynchronous log shipping technology, a second copy of the database is created and maintained on the same server on a different set of disks. In the event that the database or initial set of disks becomes corrupt or unavailable, an administrator can rapidly cut over to the secondary copy of the database and logs. This solution is great for protecting the disks associated with the databases where end users' mailboxes reside. Unfortunately, LCR doesn't protect against outages caused by the Exchange mailbox server crashing.
Single copy clusters (SCC)
SCC is built on top of a traditional Windows Failover Clustering configuration. An Exchange-clustered mailbox server is formed by having two independent servers, also known as nodes, access shared storage. The Exchange databases and logs reside on the shared storage, and either node within the cluster can access it. However, because Windows Failover Clustering uses the "shared-nothing" model, only one node within the cluster can access the Exchange data at any one time.
With SCC implemented, end users connect to an Exchange server, which has a virtual identity independent of any nodes within the cluster. For example, if the Exchange Virtual Server is running on the first node and the first node fails, the Exchange Virtual Server will automatically fail over to Node 2. The failover process is automatic and does not require administrative intervention, and the transition is seamless to the end users. SCC provides maximum uptime for an Exchange mailbox server, but as the name suggests, only a single copy of the data resides on shared storage. Many messaging experts feel that maintaining only a single copy of data is a single point of failure that could bring the whole Exchange server cluster down. Exchange failover clusters require shared storage and hardware approved by Microsoft.
Cluster continuous replication (CCR)
CCR is a new high-availability mechanism introduced with Exchange Server 2007. It provides both high availability and site resiliency and has characteristics of both LCR and SCC -- it leverages Windows Failover Clustering and built-in asynchronous log shipping technology to create and maintain a second copy of the database on a passive node within a cluster. However, unlike traditional clustering with SCC, two independent copies of the databases and logs exist. By maintaining duplicate copies of the data, maximum availability and automatic failover can be achieved. There's no single point of failure. Also, when using Windows Server 2008, it's possible to place the nodes of the cluster in multiple data centers. This will satisfy both site resilience and disaster recovery requirements for many customers.
Standby continuous replication (SCR)
SCR is a new high-availability option introduced with the release of Exchange Server 2007 Service Pack 1. It's intended for use in scenarios in which Exchange data needs to be replicated from a source server to a standby recovery server. By using the same continuous replication features found in LCR and CCR, you can replicate a copy of a source Exchange server's database to a target Exchange server database. The target recovery server can reside in another data center, providing for a great disaster recovery solution. SCR doesn't require special hardware, shared storage or Windows clustering technology. However, the failover process is not automatic and customers need Microsoft Outlook 2007 if they want their email clients to be automatically redirected from the source Exchange server to the target Exchange server in the event of a disaster.
Choosing a high-availability option
It's important to note that the four options don't require the same effort for implementation. LCR is relatively easy to configure and can be implemented within hours, unlike the others, which could take a few weeks depending on the customer's storage expertise and the amount of data within the databases. CCR and SCC require intimate knowledge of Windows Failover Clustering. And SCC requires knowledge of configuring shared storage systems (SAN and/or NAS) based on a Windows Failover Clustering model. SCR does not require Windows clustering technologies or a shared storage system; however, the implementation is done through PowerShell Cmdlets, and the seeding of the databases from the source server to the target server takes time, especially if there is a tremendous amount of data and the target site resides in another data center with low latency or a slow WAN link.
To choose the right high-availability option, you should first establish your customer's Exchange availability goals and service-level agreement (SLA) requirements. Clearly, CCR and SCC are best suited for situations when service providers are trying to achieve maximum availability, automatic server failover and transparent client redirect for their customers. While CCR and SCR are very similar, CCR does not require shared storage such as a storage-area network (SAN) or network-attached storage (NAS) device, whereas SCC does. But when using CCR, the amount of storage required doubles, resulting in increased storage costs. This could be significant if, for instance, your customer has 5 TB of Exchange data. If the customer is trying to implement a disaster recovery solution with minimal costs, then SCR would be a perfect fit, as the standby recovery server can span into another geographical data center. Finally, if your customer does not have a requirement or budget for protecting an Exchange server with high availability but it still wants to protect the local copies of the Exchange databases from failure, LCR should be leveraged.
It is possible to combine the Exchange 2007 high-availability options for maximum protection against single-mailbox loss, disk failure, storage failure, server failure and data center failure. A common scenario includes using either CCR or SCC within a site and then adding SCR on top to satisfy disaster recovery requirements. Clearly, there isn't one magic bullet solution that addresses every customer's Exchange server high-availability requirements. But by understanding the alternatives and combining the technologies, it is possible to solve your customer's requests.
About the author
Ross Mistry is a partner and principal consultant at Convergent Computing, located in the San Francisco Bay area. He is a co-author of SQL Server 2005 Management & Administration and Windows Server 2008 Unleashed. Ross frequently speaks at international conferences such as SQL Server PASS and Dev Connections. He is currently working on his latest title, SQL Server 2008 Management & Administration, which is scheduled for release in Fall 2008.