Email archiving servicesDate: Jun 17, 2008
In the world of litigation, email has become prosecuting lawyers' favorite smoking gun, and being able to respond to email discovery requests in a timely manner is critical for businesses that want to avoid stiff penalties from the SEC. Yet 70 percent of organizations still don't have an email archiving strategy in place, according to a recent study by AIIM, a nonprofit group focused on enterprise content management. In the absence of an IT-controlled email archiving system, the manual discovery process is demoralizing and time-consuming, involving lots of guesswork and data retrieval from tape.
For storage solution providers, this scenario means there's huge growth potential around email archiving systems.
While it's clear that companies - especially midsize and large ones - need email archiving systems, many companies have a hard time finding the right combination of hardware and software for an efficient and cost-effective email archiving system. This webcast, presented by Storage Switzerland founder and senior analyst George Crump, will show you how to drive email archive sales with your customers by helping them find the right mix of tools - toward the goal of reducing ediscovery-related costs and producing more reliable results.
Get details on the dynamics at play around email archiving: SEC regulations, backup considerations and best practices for retention. Finally, find out which hardware and software products do a good job at email archiving.
Read the full transcript from this video below:
Email archiving services
Sue Troy: Hello and welcome to the SearchStorageChannel.com webcast on email archiving. My name is Sue Troy, and I’ll be your moderator. Joining me is our guest speaker, George Crump.
George Crump: Hi Sue.
Troy: Hi George. George is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. George has 25 years of experience designing storage solutions for data centers across the United States. George has seen the birth of technologies such as RAID, NAS and SAN.
Prior to founding Storage Switzerland, he was CTO at one of the nation’s largest storage integrators, where he was in charge of technology testing, integration and product selection. Thanks for joining us George.
Crump: Thank you, Sue.
Troy: George is ready to begin his presentation on email archiving. Whenever you’re ready, George, go right ahead.
Crump: Thank you, Sue. We are going to talk today about email archiving, and why that’s a significant storage opportunity for the channel. First let me give you some background on Storage Switzerland. We are an independent analyst group. We focus primarily on the data storage and IT virtualization market places.
We provide services that include strategic consulting to manufacturers of those technologies, resellers and integrators of those technologies, and, of course, end users of those technologies.
We also have a full-blown testing facility that we use to do independent validation of the products and solutions that we recommend. We write a fair amount of articles and white papers that appear in various publications, of which SearchStorageChannel.com is one.
Our website, Storage.Switzerland.com, is updated weekly and is a resource for a lot of those articles and white papers, as well as some product reviews directly out of our lap.
Moving on to the agenda, we are going to go over why email archiving is still an important subject for you to bring up with your customers. We will review what the problem is, some potential solutions and lastly [will give] some advice from an integrator perspective on how you should be approaching this market.
Why should we still be paying attention to the email archiving market? I think there is a common misconception today that everybody has already bought email archiving. That is certainly not the case. It is not a very heavily penetrated market. There has been a lot of publicity and marketing around trying to get the market going.
Most of the installed base today tends to be in the FCC-regulated field, where essentially there is a government mandate for them to get in compliance, and put email archiving in and things of that nature.
What’s really driving the market today is [that] we are moving beyond regulation. Clearly, that is a concern for many organizations and will remain so, but we are moving beyond that and into retention and management for litigation readiness.
That is the key driver today in email archiving and management -- to be prepared for the possible litigation that can result.
Lastly, email continues to be a storage management and data protection problem. It is one that we will continue to have for the foreseeable future. An email archive can help address some of those issues and concerns.
The downside to email archiving is the perception that the low-hanging fruit has already been picked. The companies that really had to make an email archiving decision have already done so. To some extent, I think that is a fair assessment.
As I said earlier, we think the big opportunity is in the litigation readiness and email management spaces. A bigger concern from an integrator perspective is that there is a lot of confusion over what should be retained exactly and for how long it should be retained.
There [were] often a lot of very long conversations that weren’t always very productive discussing the problems with retention, and those sorts of things. However, once you are educated on it, it really allows you to add a lot of value to your customers and become the expert in that field.
Storage management looks like it can be addressed by the modernization of email packages or by restricting users. The most common thing we see today is restriction of the amount of email that a user can get. We have seen sites [allowing] as little as 25 MB to 50 MB in a mailbox. I don’t know about you, but for me that might be a day’s worth of email. That can be very challenging.
On the opposite end of the equation, the email systems are getting better at retaining information for a very long period of time, and having very large email stores. That of course creates problems in data protection, which ... we will address during our conversation today.
Let’s talk about the email archive problem. On this slide you are going to see a graph of the compounded annual growth rate. You will see that we have a very fast-growing market still today.
It is estimated to grow at about a 40% compounded annual growth rate year-over-year all the way out through 2012. The other thing that makes this very interesting is that … these aren’t just upgrades to existing email systems.
Over the next three years, we expect sites that have implemented email archiving to grow from about 22% of available organizations to about 61% of available organizations. Both of these are from Austin Research … indicating significant growth and significant opportunity in the market today.
On the next slide, shifting from compliance to litigation is really the driving force behind this growth. The companies that really needed to be compliant jumped on email archiving early, not because they wanted to, but more likely because they had to.
Litigation affects a much broader section of companies. As the first bullet there indicates, 60% of companies with 10 or more employees have faced more than one lawsuit in their first 10 years of existence.
Tied closely to that, 95% of all discovery requests last year involved some form of email delivery. They specifically wanted a range of emails and things like that. You may not even be one of the people being litigated against.
For example, in the organization I used to work for, we were asked to deliver all emails from Supplier A to Supplier B that we sent. Even though we were not a participant in the litigation, we got involved in it and had to deliver a pretty sophisticated discovery request.
Lastly, 50% of all litigation costs come from pretrial fulfillment of discovery requests. To put those two together, you have 50% of the cost of litigation comes from pretrial, and 95% of that pretrial effort involves some form of email.
There are hard dollars being spent to fulfill these requests. Companies are being pulled into it that aren’t part of the suit, but need to comply with some sort of a court order.
At the bottom [are] headlines on various fines levied. One of these appears at least every week. Certainly you can Google and find the latest. Those are clearly an ongoing thing that we will continue to see.
The big challenge with storage management is that system administrators still set mailbox restrictions. It causes users to have to export data to .pst files. Often, if not always, this is an endorsed practice by that same system administrator.
This creates two problems: Number 1, these files are stored on the network. We have a customer that we worked with in the past that had a 50 MB restriction on their mailboxes. They had 14,000 email users and, as a result, had well over 30 TB of .pst files. Those were actually being stored on dedicated file servers, or NAS devices.
They became a major backup problem to the environment because one addition to a .pst file created a scenario where the backup application had to actually back up the entire file. It was a major problem and a big storage issue as well.
Secondly, the customer is really not protecting themselves at all by storing data in .pst files. It is very easy for these files to be requested and pulled, and imported into some sort of a discovery application, content indexed and searched for specific information.
Obviously, if you’re being sued and you try to delete these files, that is against the law. You have to deliver them. This is clearly a storage management problem. What you are doing is trading storage management problems in your email application for storage management problems in the storage side of the equation.
The other big issue here is a major productivity waster. That same Austin [Research] study indicated that the average user spends about 1.5 to 2 hours a day managing email. That’s not sending, composing and replying to email. It’s deciding if I put this email into a .pst file or if I put it into this folder. That sort of thing.
I am sure that is 1.5 to 2 hours that they would rather have back. The ramification is if you approach a limitation on your mailbox size, for all intents and purposes, you actually lose the ability to send and receive email until you get back under your threshold. That becomes Priority 1 for many users to fix.
The other option is to let the mail store grow. Modern email applications allow for significantly larger mail stores than we have seen in the past. The temptation to do that exists.
The challenge is that this will affect overall email system performance [and it's] a challenge from a data protection standpoint.
Roughly 60% of decision makers in that same study cited the growth in email storage as a serious or very serious problem. This is very high in the mind of the CIO: What they are going to do with email storage?
From a backup and recovery perspective, the good news is that backup and recovery in the Exchange environment has improved significantly over the past three to four years. Most backup applications, if not all, can do some sort of message-level recovery and message-level backup today.
In most cases, although there are some exceptions, it is still a very slow process. We have seen an explosion in point solutions designed to specifically protect or enhance the Exchange environment. Many times these can be add-ons to existing applications that give it some Exchange intelligence.
For example, in snapshots, it might be an Exchange module for a snapshot that quiesces the Exchange database, puts it into a backup mode and gets a good clean snapshot of the message store.
What we are seeing as a result of all of this is that there are so many point solutions available today that enhance data protection that there’s almost too many of them, and all the extra data protection processes are causing challenges as well.
Email archiving can address and, in many cases, eliminate the need for that sort of an issue as well.
One of the things that will come up when we talk to customers is that they think they can count on their tape backups or their disk backups as their mechanism for email archiving. I like to make sure everybody understands the typical process.
Depending on the email application, this may vary, but in general is pretty accurate. The first thing you have to do is stand up a target Exchange server, which is a demo server. The next thing you have to do is guess what the date range of emails that the discovery request requires. Sometimes the discovery request specifically cites that.
In many cases, it is every email you have ever sent over a very long period of time that relates to this case. It is very hard to identify. An important thing to note from a backup application perspective, if this is an old request, [is that] all backup applications keep an index that tell what files are on what tapes.
These indexes in the backup world can get quite large, so it is very common to purge them after a period of time. In that scenario, those tapes have to be read back in. You have to guess at a date range, and if it has fallen out of the backup index, before you even can know if you guessed right on the tapes, you have to rescan all of these tapes back into the backup database.
Then check the tapes to see if you were right. If you guessed correctly, then you can use the email application’s search capabilities to find the data that you are looking for.
In our experience, this tends to be very, very slow. It is something you want to be careful with and if you can, avoid. If you guessed wrong, the bad news is you get to start this whole process over again. You guess on another set of tapes, rescan those into the backup application and then start the search.
The biggest challenge -- and we have seen this happen several times -- is where the user thinks they guessed correctly and they think they got the right data. Instead, they missed something or didn’t get all of it.
They appear at discovery, and the opposing council has the emails that they missed. The organization looks like it was hiding something. That can also result in fines of some nature. It is important to understand that. Most users understand at some level what it takes to do this.
It is extremely helpful to walk people back through this process, and help them understand step-by-step what is involved in a manual recovery of email data.
From a solutions standpoint for email archiving, let’s first have a quick overview of what the common email archive architecture would tend to be. Depending on the email application, this will vary a little bit, but in most cases we capture the basic essence.
An email comes in and it’s either copied in real time or on a scheduled basis to some sort of an email archive server. That server stores the email on a database, often to a container file. More often … it is stored on a relatively high-speed, very reliable disk.
I point that out because it’s not the type of disk where you just go out and buy the cheapest SATA drive you can buy on the Internet. It’s your archive, it has to work, and it has to be reliable.
Typically, these databases are containers. The containers limit the number of emails in each individual container. We have separate files to deal with, and then they are linked together when you go to do the search.
The reason that is important is as the containers age, meaning you filled up this container with 5,000 messages and you think your likelihood of needing it lessens. You might move it to a less expensive storage medium.
Traditionally that has been tape or optical. It certainly can be the focused archive disk technologies that exist today. We will talk about some of those solutions in detail.
The important thing about containers is they allow you to continue to move data and put it to less and less expensive storage as time goes on.
From a solution standpoint, there tends to be two types of products. From a macro level, the debate is, Should I host or should I do this internally? Hosted solutions, especially for smaller businesses, are interesting. The email is sent to both the customer servers and the provider servers simultaneously.
The provider’s responsibility is to store, organize and index these emails as they come in. In the case of a discovery, the user would make a request to the provider for data. The pros of this solution [are] that you expect lower upfront costs and less use of internal staff at the end user to deploy and manage.
The cons of this solution are that long-term costs get expensive because you are storing more and more data over time. It becomes an ever-growing bill to the corporation. Most end users have resistance to handing over corporate assets, especially one like email, to a third party.
There are also legitimate technology concerns over confirmation that the email arrives at the hosted site as well as the bandwidth requirements between the sites. There is clearly some resistance from that standpoint.
Third, it is difficult for the channels to participate unless they are a provider. If there is any participation, it is mostly in the form of some sort of an agent’s fee.
Finally, there is anecdotal evidence of slow search and retrieval response times. I have never seen a formal study of it, but we have seen repeated complaints that there have been slow search and retrieval response times. That is a continuing concern.
On the next slide, [looking at internal email archiving] the advantage is that all the components are purchased and owned by the customer. Also important is that it is sold by the channel and integrated by the channel. The selection process can be confusing. There are a lot of email archive solutions available.
I am always surprised at how many new ones seem to come out almost every year and how many upgrades there are to current ones. There is a long selection process for the user to go through.
I was speaking to a customer just yesterday who actually reviewed 15 different email archive solutions. That is a lot of review work. It is something you have to help the customer work their way through.
There is also this concept of a standalone solution versus integrated. We will talk about some of those in detail. There is a higher upfront cost because you are physically buying the equipment, physically owning the software and you have implementation costs.
Generally over the long term, the curve should go down, and you should see this become a less expensive solution long term than a hosted solution.
[There is a broad selection of options for internal solutions.] There are a lot of software solutions, and a lot of hardware solutions. Now we are starting to see the emergence of turnkey appliance solutions that integrate the hardware and the software into one black box.
Here’s some advice to the integrators on how to approach this market, and some recommendations on types of solutions you can consider when making a recommendation. In this market, you still generally have to make the case for email archive. Help the customer understand why they really want to do this.
Litigation seems to be a very viable reason to do that. What we tend to hear a lot, still today, is what I call "the short-term retention phenomenon." Typically you will hear something along the lines of "We only keep email for 30 days." There’s a big challenge with that.
I use this graph as an example of why that doesn’t tend to work well. We are going to look at the life of a typical email. What you see on the screen here is a tree of the life of an email. It starts off emailed to one person on your server. That person forwards it to a bunch of different people.
They either copy it to a local .pst file, or they send it to a private email account or they copy it to a laptop. It could be any number of different things. Now you execute a delete on your server. But this email still exists in a lot of different places, two of which, where you see the two blue dots, you probably don’t have any access at all.
The problem with this short-term deletion policy is that you are deleting the very evidence that you were going to use to defend yourself.
I have a good example of that. We had a customer that had a lawsuit come about over an HR-related issue. It was an inappropriate email. The offended person took the customer to court. The person had forwarded it to their private Hotmail account so they had a copy of it. The customer had a very [bad] retention policy.
They had no email archive in place and had deleted any evidence that the file existed. By luck, they stumbled onto that user’s .pst file that was on a laptop that they were using that had not been reformatted yet.
They found in that .pst file copies of where they had forwarded it to their friend saying, "Ha ha; isn’t this funny?" Obviously, they weren’t all that offended if they forwarded it to their friends.
They got extremely lucky in finding this email. The way they found it was that they were getting ready to reformat the laptop, and the IT administrator was rather sharp and knew the file.
That was a one-in-a-million chance. Had they had an email archive application, instead of sweating it out and getting very lucky, they could have very easily done a search and seen every copy of that email and every instance that it had been forwarded and what the forwarded text had said.
That is one of the strongest cases we have seen using no archiving. It really helps defend yourself. In these retention policies, you can get into the legal "whys" and "why nots" of retaining data for that short period of time.
In many cases, it’s just illegal. In most cases, you are removing your ability to defend yourself. The offending email, I can almost guarantee you, gets somewhere else.
Moving onto the other two issues with retention, long-term retention is another one we hear almost as often as we hear short-term retention. Long-term retention is essentially the customer giving up and saying, "We will just keep email forever, and that way we are always protected."
This is a little less hard to argue with. I can absolutely see a case where that would make sense. The challenge, though, in keeping an email forever is [that] there may be a very viable time that you want to delete and remove emails or files of any nature. If you’re not removing them, then the entire history of the company is exposed.
We tend to recommend against that, but not as strongly as we recommend against the short-term stuff. If customers do want to keep emails for an indefinite period of time, much more emphasis should be placed on the ability of the archiving application to search and index files. That should be your key point. It keeps those files small and very manageable.
The last one is probably the hardest to overcome. I always call it "Our legal counsel said." This is IT moving the problem out of their court and into the legal court and washing their hands of it. The problem is that the corporation is still at risk.
You have a couple of approaches here. You can try to convince the IT guy to raise that issue up. You can talk directly to legal counsel. Or, you can move on and talk about an email as a storage management issue. We’ll talk about those on the next slide.
The next slide is addressing email as a storage management issue. My brief (inaudible 0:30:26] for storage channel partners this is probably a much more comfortable area to talk about, and something you have probably spent more time in. It is something worth bringing up to a customer.
There is a huge concern over .pst files still today and what their impact is on the network and also the impact on backup and things of that nature. Also, consider the impacts of building very large email stores.
The email applications tend to have the ability to do this today. But that means more and more expensive primary storage, and it also consumes more and more server bandwidth as we go through.
The second area is addressing email as a data protection and recovery issue. This should be a very comfortable area for storage channel partners. We are shifting the focus to the continuous data protection solutions and application-aware snapshots.
This may be an opportunity to talk to the customer about a new storage platform, or a software application that does a real-time or near-CDP type of backup that can then be leveraged.
There are utilities that can access snapshots of copies, or replicated copies of Exchange stores for search and retrieval information. Those exist and are pretty popular today.
Moving into the software recommendation, everybody will argue at some level where they belong on this. In general, there is an integrated approach and then there is an archive-specific tools.
The integrated approach would be something like what Commvault Software did on the software side. What Commvault has done in their approach to the market is that they have made this very large central application, and to that connected backup, email archiving, file archiving, etc.
The value is that you can leverage that same metadata database for backup that you are using for email archiving and file archiving. There should be some impact as a result of doing that.
The challenge that Commvault would face is that to see the benefits of the entire solution, you have got to be prepared initially to buy into the whole solution. Many customers don’t want to do that.
The other solution is ideal for storage channel partners that are new to email archiving or don’t have a large professional services staff. A company like Intradyn makes an email archive appliance that is a black box you plug into the network. There is some configuration, and then it starts archiving data.
This allows you to participate in this space, especially in the small-to-medium-size business area, very, very effectively without becoming the next great email archive expert. There is a lot of validity to that type of solution as well.
Then there is what I would describe as archive-specific tools. Some might be surprised that I put EMC and Symantec into this area. That’s because, while I am well aware that they have other applications, certainly, backup and things of that nature, There is not a great deal of integration between these applications and servers. They were all acquired.
You need to look at [servers] as standalone utilities specifically to solve an email archive problem. That’s not to say they don’t do that well, but just look at it as that. There's also Mimosa, a relatively new player on the market. Their value is that they are focused solely on email archiving.
They have built a file archive engine on top of it that leverages that same database so they can instance out multiple copies of similar files. There are probably 14 other archive applications. These are the ones that we are seeing quite common in the field and that are getting pretty good reception from end users as they interact with them.
From a hardware perspective, a very key thing here is this is not an area where you want to go out and put the cheapest possible SATA disk array into it that you can. This is a company’s archive. Theoretically it should last a long period of time.
There are disk archive solutions specifically designed for this type of environment, two of which are EMC Centera and Permabit. There are others. These are what are called CAS-based systems, or Content Addressable Storage. They fingerprint the file. They can leverage that fingerprint to also do some data reduction so you don’t see the same file twice.
In archives you don’t see as much duplicate data as you would in backups. That said, the fingerprinting also gives them the ability to do data integrity checking over time. If you are going to have an archive that you want to last for 10, 15, 20 years, the ability to constantly rescan that data had a lot of value.
They also allow for scaling out to very large systems. These are both grid-architected solutions. To add storage you just add an additional node to the grid cluster, and you can grow these into the high terabytes to petabyte range.
There are also tape and optical. I didn’t put any specific products in there because there is a wide range of those, and I think that they are pretty well known at this point. We are seeing a move in general away from that type of technology and specifically landing on these more disk-based archive solutions. For the right customer, tape can make a lot of sense. The only thing you have to be very careful with is over time, the media format changes. What they use today and certainly in 10 years may not be commonly available or even readable.
You need to design a plan for them with a stop point where they do a migration to a new technology when both technologies are available.
In closing, I can’t emphasize enough that email archiving remains a very good market for storage integrators to be involved with. There are easier ways to get into the market today with some of the integrated appliances. The software applications are more mature.
You don’t have to get wrapped up in as much of the regulatory side of the business, which can get confusing and a bit daunting. Undeniably, there are perceptions and challenges to overcome.
Most people want to pretend email didn’t exist; working around that is important. There are definitely solid solutions for integrators to recommend. This is a maturing market with relatively reliable solutions.