Published: 07 Jul 2010
Solutions provider takeaway: The absence of single-instance storage in Microsoft Exchange Server 2010 affects how solutions providers should go about capacity planning for customers. Now, you must know how to calculate the amount of white space, dumpster size and content index size prior to mailbox migrations.
One of Microsoft's primary design goals in creating the Exchange 2010 architecture was to improve database I/O efficiency. To accomplish this goal, Microsoft made numerous changes to the Exchange Server 2010 database structure. As many solutions providers already know, the most significant of these changes is the removal of single-instance storage, which was a database design feature that existed in every previous version of Exchange.
Single-instance storage is based on the fact that an Exchange user often sends a message to multiple recipients, and because each recipient receives an identical copy of the message, there is no reason to store the same message in the database numerous times. Previous versions of Exchange only stored a single copy of each message and used pointers to link recipients' mailboxes to the message. If a customer deleted the message from his mailbox, all he was really deleting was the pointer to the message. The message would continue to be accessible to the remaining recipients.
Since single-instance storage does not exist in Exchange 2010, many solutions providers are wondering what effect its absence will have on their customers' databases once they begin the migration process. Unfortunately, there is no clear answer to this question, because the affect on the database varies dramatically depending on how your customers use Exchange.
Exchange 2010 allows messages to be saved in three basic formats: Rich Text Format (RTF), plain text and HTML. The Exchange 2010 architecture is designed to compress straight text and HTML data, including the text within message headers. Compression helps offset the loss of single-instance storage. However, Exchange 2010 does not compress RTF data. If all of the messages within an Exchange mailbox database were in RTF format, the database would end up being about 20% larger in Exchange 2010 than in Exchange 2007, according to Microsoft.
Message attachments can also have a major effect on database sizes. For example, if your customer sent a message with a 10 MB attachment to 20 people, it would consume roughly 10 MB of space within the Exchange 2007 database (plus a negligible amount of space for overhead). In Exchange 2010, that same message would consume more than 200 MB of space.
Capacity planning without single-instance storage
Because a customer's mailbox data can potentially consume much more space in Exchange 2010 than in Exchange 2007, it is critical to do some capacity planning prior to mailbox data migration. Solutions providers need to have an idea of how much disk space is actually going to be used by the mailbox database so they can ensure that a customer's Exchange 2010 server is able to accommodate the database with room to spare.
I recently heard of an organization with a migration plan that involved figuring out how much disk space was available on Exchange Server 2010 and then dividing that space by the number of mailboxes on their old server. They then put mailbox quotas in place to ensure that users would not cause the server to run out of disk space.
This plan worked for the organization, but only because none of the mailboxes were close to the quota limit. Had the mailboxes been larger, the results would have been disastrous because this plan failed to account for database overhead.
Look at the size of each user's mailbox, the amount of white space in the database, the dumpster size and the size of the content index to determine the actual amount of space that will be consumed by the database.
Assuming that your customers' mailboxes are at their maximum quota size, you can define database white space as one day's worth of mail. One day's worth of mail is the measuring quota, because each day, users will have to delete enough mail to receive a full day's mail while still remaining below their quota. Therefore, if you have 100 users, and each day they all receive an average of 10 MB worth of mail, then the database white space will be approximately 1 GB. Keep in mind that each night, Exchange runs an automated maintenance cycle that helps prevent the accumulation of white space. If this maintenance cycle fails to run, then the amount of white space in the database can grow beyond these estimates.
Calculating database and dumpster size
When a user deletes an item, it is not actually permanently gone. Instead, the item goes into a dumpster and is considered to be soft deleted. Messages and calendar items remain in the dumpster for 14 and 120 days, respectively, and are recoverable during that time. Assuming that single-item recovery is enabled, the mailbox's size will increase by an additional 1.2%. There is also a 5.8% increase in mailbox size to account for calendar logging data.
To determine the size of the dumpster, solutions providers can use this formula: (Average daily volume of inbound and outbound messages (in MB) x the deleted item's retention window (14 days by default) + (Mailbox Quota Size * .012) + (Mailbox Quota Size * 0.058).
Content indexes help solutions providers search more quickly for content within mailboxes and are usually about 10% of the total size of the database.
Given these factors, let's imagine that you have 100 users, each one sends and receives an average of 10 MB of data per day, and each user is at their maximum quota of 1 GB. If you calculate the database size by multiplying the number of mailboxes by the mailbox quota, then the database would be 100 GB in size. Once you take white space, indexes and dumpsters into account, the numbers look like this:
- Combined size of all mailboxes: 100 GB
- White space: 1 GB
- Dumpster: 14,336 MB (daily mail volume multiplied by 14 days) + 12.28 MB (mailbox quota * .012) + 59.39 MB (mailbox quota * .058) = 14,407.67 MB, or roughly 14.07 GB
- Indexes: approximately 10 GB
You can see that the total space used by the mailbox database is about 125 GB, which represents a 25% increase over what we would have expected had we simply multiplied the mailbox quota by the total number of mailboxes.
Because single-instance storage doesn't exist in Exchange 2010, it has become increasingly important to implement mailbox quotas as a way of controlling database growth. But quotas alone cannot be used to predict the total size of a mailbox database. Solutions providers must take other factors into account, such as database white space, the dumpster and the content indexes.
About the expert
Brien M. Posey, MCSE, is a Microsoft Most Valuable Professional for his work with Exchange Server and has previously received Microsoft's MVP award for Windows Server and Internet Information Server (IIS). He has served as CIO for a nationwide chain of hospitals and was once responsible for the Department of Information Management at Fort Knox. As a freelance technical writer, Posey has written for Microsoft, TechTarget, CNET, ZDNet, MSD2D, Relevant Technologies and other technology companies. You can visit his personal website at www.brienposey.com.