Cloud gateways for primary storage: Benefits and challenges

Most IT shops that have ventured down the cloud storage road have restricted their use to backup or archiving. Now cloud gateways are making primary storage use cases feasible.

Using the cloud for storing primary data is different from using the cloud for backup or even archiving. Primary storage must be fast enough to support users and applications in real time, which requires getting files back from the cloud, not just sending files to the cloud within a certain a backup window. Using the cloud for primary storage also means providing software services that have become "standard equipment" in most traditional storage systems: features like replication, snapshots, virtualization, thin provisioning, tiering, etc. Cloud gateway appliances address these issues, leveraging the benefits of the cloud for primary storage applications while addressing its shortcomings.

Cloud storage gateways are hardware- or software-based appliances that provide basic protocol translation and simple connectivity to cloud storage to facilitate its use in an IT environment. As primary storage, they need to make the cloud look like a local storage system or NAS and address the latency inherent in data transfers to support users' and applications' real-time needs.

Cloud storage infrastructures are accessed with RESTful APIs over HTTP rather than the block- or file-based protocols familiar to users and applications. Cloud gateways perform this protocol translation and make cloud storage resources available to the environment and familiar to the users. On the front end, cloud storage gateways provide NAS services, block storage protocols or both to the data center. On the back end, they connect with cloud storage infrastructures and facilitate the transfer of data via these Internet protocols and special APIs. Gateway manufacturers must do the API integration work and be tested and certified for each cloud storage provider that they support.

Cloud gateway appliances, whether implemented as hardware or software, employ a caching or tiering process that prioritizes data sets to keep the most active data on local storage and manages the movement of data to and from the cloud. Most have developed their intellectual property around how these data sets are prioritized and cached or tiered to reduce data transfers and minimize latency, essential for primary storage. Some have also developed an understanding of data objects and processes used by specific applications, such as SharePoint and Exchange. This application awareness can enable further data reduction and efficient tiering.

Benefits of cloud gateways

As primary storage, most cloud gateway appliances have a collection of storage features comparable to those of traditional storage systems, including virtualization, thin provisioning, replication and snapshots. Most also leverage compression, deduplication and WAN optimization technologies to reduce storage consumption and bandwidth requirements.

Gateways can reduce the costs of cloud storage. By reducing data sets with deduplication, compression and SIS, a gateway can similarly reduce the capacity of storage consumed in the cloud. Most cloud providers also charge based upon the bandwidth consumed so WAN optimization and intelligent scheduling of data transfers that most gateways provide can also cut down on the cloud storage bill. Cloud backup mostly focused on providing a destination for backup applications to write data. In contrast, cloud gateways need to implement management functions like snapshots, virtualization, etc, in order to act more like the local primary storage systems they're replacing.

Gateways also handle security concerns, usually with encryption, which is run as data is transferred to the cloud. Most cloud providers don't keep encryption keys so gateways have a mechanism for users to provide them and set up a new gateway to receive files, like in a recovery scenario.

Sending data to multiple cloud providers is also enabled with gateways, which can split data between different cloud providers for business reasons or send copies to multiple locations as part of a DR strategy.

The challenge of using the cloud for primary storage

Cloud storage is physically remote and moving data consumes two things, time and bandwidth, in inverse proportions. The more bandwidth you have, the less time your data transfer will take, and vice versa. For applications like backup, this bandwidth/transfer time tradeoff is less of a problem, since getting backups to the cloud is what most concerns users and incremental backups reduce the amount of data transferred at one time. But with primary storage use cases, where getting data (sometimes large amounts of data) back from the cloud is all-important, it can be a deal breaker.

To address this cost/latency issue, gateways must use the intelligence of software rather than the brute force of big pipes. In contrast with simple tiering algorithms often used with local storage systems, cloud gateways must do more to reduce the amount of data sent to and from the cloud. Some employ algorithms that prioritize data sets based on user behavior, similar to the way search engines rank and cache pages for faster retrieval. Others have an understanding of the applications that are at work and can identify their most frequently used data objects, keeping them in local storage longer and restoring them first from the cloud.

For primary storage applications in some high-performance environments, obviously cloud storage won't be appropriate. But for most day-to-day applications, the latency induced by large data transfers can be successfully handled with intelligent, local data caching of active data sets. In addition, the use of high-speed storage like SSDs can improve performance even more and hide the fact that most of the data represented to the organization by the cloud gateway is a long way away. Storage-hungry apps like SharePoint and Exchange produce a lot of similar data objects over time, and the application awareness mentioned above can enable some single-instance storage (SIS) data reduction as well.

Hardware vs. software cloud gateway appliances

The term "appliance" usually refers to a self-contained system, which, in storage, means "a box" with capacity, processing power, networking connectivity and management features. Hardware gateway appliances are most commonly 1U or 2U chassis that support cloud storage providers on the back end and block storage (iSCSI) and/or file services (CIFS, NFS) on the front end. They also have embedded CPUs, memory and storage management features. Some connect to local storage assets, but most have internal disk and SSD storage.

The software gateway appliance, on the other hand, downloads as a virtual machine image that the user runs in his virtual server environment. Local storage capacity, CPU and networking resources are leveraged to complete the infrastructure, which has the look and feel of a traditional primary storage system. As a VM, software cloud storage appliances are easier to get up and running and can leverage existing server and storage assets, if they're available.

Cloud storage gateways that are implemented as a hardware appliance provide a turnkey solution that doesn't require the existing local storage assets or virtual server infrastructure that a software implementation does. They can be implemented easily and offer predictable, consistent performance, since they don't rely on the configuration of an existing infrastructure. For environments without available virtual server capacity, a hardware appliance solution may be better than a software implementation. But hardware-based appliances do represent another piece of hardware to buy, support and possibly replace in a DR situation. Virtual cloud storage appliances can be brought up in minutes, making the software solution potentially better from a DR perspective. They can provide economical block or NAS storage for primary storage applications, but performance is dependent on the existing virtual infrastructure.

About the author
Eric Slack, a senior analyst for Storage Switzerland, has more than 20 years of experience in high-technology industries holding technical management and marketing/sales positions in the computer storage, instrumentation, digital imaging and test equipment fields. He's spent the past 15 years in the data storage field, with storage hardware manufacturers and as a national storage integrator, designing and implementing open systems storage solutions for companies in the Western United States. Find Storage Switzerland's disclosure statement here. 


Dig Deeper on Primary and secondary storage