Microsoft Exchange 2010 brings significant change to the Exchange Server environment, and one of the most radically affected areas is storage. There are two key changes: the removal of single-instance storage
When single-instance storage was first implemented in Exchange 4.0 back in 1996, it was designed to reduce the amount of disk capacity that the Exchange database required (the database size was limited to 16 GB). It did this by identifying redundant messages and attachments and storing only one copy of them.
Back then, Exchange was more of a departmental solution and not really designed for the enterprise. There was only one mailbox database instance per Exchange server, and so the possibility of finding a match and eliminating it was high. Also, the cost of a gigabyte of hard disk space was much higher then than it is today. This meant that the development effort and performance degradation of single instancing was worth the money it saved IT shops.
Single-instance storage loses impact
With Exchange 2000, Microsoft introduced the concept of storage groups and increased the number of databases you could have. With each iteration of Exchange since then, Microsoft has increased the number of databases supported. Since SIS works only on a single database, not across databases, the effectiveness of single instancing has been ever decreasing. According to Microsoft in early 2010, most customers were seeing only about a 20% maximum efficiency gain in storage capacity as a result of single instancing. This factor, combined with the continued decline in the cost of a gigabyte of storage plus the performance implications of single instancing, have lead to the demise of the technology in Exchange 2010.
Direct-attached storage as an option
Beyond the removal of single-instance storage in Exchange 2010, Microsoft has made changes around the use of direct-attached storage (DAS). While Exchange has always had the option to use direct-attached storage, most customers, for either availability or scalability reasons, looked to a SAN to host Exchange 2007/2003 message stores. But Exchange 2010 goes beyond just using DAS with RAID, promoting the concept of RAID-less direct-attached storage—essentially individual drives installed into a server.
Microsoft enabled this option by making Exchange more storage-savvy. The product now uses the concept of replicas—basically, copies of Exchange mailbox databases that are continually updated on other hard drives. This means that you can have two, three or more copies of a mailbox database that are up to date with the primary.
The operating principle here is the idea of cheap storage. The goal is to get to a 1:1 disk-to-database ratio. This provides very inexpensive storage for Exchange, allowing users to store more email than ever. And it provides a very predictable performance model since there are fewer variables at work.
Adding value through efficiency and SAN
At first glance, these two changes to Microsoft Exchange storage support might leave you and your customer wondering whether a shared storage system is needed anymore to support Exchange. The truth is that both are needed now more than ever.
Efficiency gains may be the easiest way to justify a SAN to support Exchange 2010. If you deliver a solution that can provide block-level (Exchange 2010 works on a SAN but not a NAS) deduplication, compression or both, you have a strategic advantage over DAS because those processes work across all the storage in the system. With Microsoft's 1:1 DAS scenario, in Exchange 2010, there's no built-in way to remove duplicates. Using SAN-based deduplication can mitigate the loss of single-instance storage. And SAN-based deduplication technology operates across databases and is not restricted to finding redundancies within a single database. Also, many deduplication tools are sub-file-aware. They will identify redundant segments between similar files, not just perfect matches between files.
Another efficiency feature that many storage systems have is thin provisioning, which enables higher utilization of disk capacity. While it’s true that disk is cheaper now than it used to be, a data center full of disk isn’t cheap. All those drives have to be purchased, powered and cooled. SAN-based thin provisioning promises more efficient use of disk space than DAS in an Exchange environment—especially in the 1:1 design construct, where Microsoft is counting on plenty of free space per drive to comfortably accommodate growth of Exchange 2010 storage needs.
In a large Exchange environment, this could mean a lot of wasted disk space just to get predictable performance. Thin-provisioned volumes across a smaller number of drives but all at higher levels of utilization may not be much more expensive to buy than a truckload of cheap off-the-shelf drives and certainly will require less power.
SANs also hold an advantage over a RAID-less DAS system in the area of data protection because the SAN does not have to account for reseeding time. Reseeding is the process of repopulating a drive with the Exchange mailbox when the primary drive fails. In the 1:1 model, Microsoft requires that you have a primary plus two copies of each drive. This means that if a drive fails, you still have a second backup while the first drive is being repopulated so that if a second drive fails, you still have the database available.
The problem is the time it takes for the reseeding process to occur. Although Microsoft claims that reseeding can occur at 35 GB to 70 GB per hour, many field reports from real installations claim that the maximum reseeding rate is only 35 GB per hour. With the much larger database sizes that Exchange 2010 can support, reseeds could take as long as 60 hours. Assuming the truth is somewhere in the middle, a 30-hour reseed with RAID-less DAS is not out of the question. The chances of losing two to three drives in that time is admittedly low, but the possibility certainly exists and the ramifications could be huge, impacting hundreds if not thousands of users.
Compared with direct-attached storage, a SAN also carries the advantages of high availability, snapshots and replication, among others. Of course, you can add to that the ability to support server migration in a virtualized environment. Put this all together and SAN is still the go-to method for storing Exchange 2010 databases.
George Crump is president of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments.
This was first published in July 2011