Object-based storage is an architecture that uses data objects instead of files to organize, store and retrieve data. Compared with file systems, which employ a hierarchical "tree" structure, object-based storage devices (OSDs) essentially maintain a flat index of Object ID (OID) numbers. This enables the OSD to decentralize the indexing or metadata, which can enable significant scalability while maintaining performance and reasonable costs -- characteristics that make OSDs ideal for cloud storage applications. This article will explore some of the OSD systems on the market and how they're implemented for cloud storage.
OSDs employ a scale-out architecture that stores data objects in a clustered node topology, in which each node maintains metadata about the objects it contains and about the objects in other nodes. Some run on purpose-built hardware, while others are software that is loaded onto commodity hardware provided by the user or run on proprietary storage array systems or in virtual machine environments.
OSDs all present access to the objects they store via RESTful or SOAP APIs, and some also include file systems or other protocols to access data as well. The devices are sold to companies to create a private cloud for archive or reference data, usually, or to cloud service providers to create a public cloud. There are a number of object-based storage vendors, some of which provide just the back-end storage and some the front-end storage services and user interface as well. Although the following companies are not the only players in the object storage space, they do represent a cross section of approaches to object-based cloud storage. (Of the companies interviewed for this article Storage Switzerland has had client relationships with NEC and NetApp StorageGRID, formerly Bycast.)
Now let's drill down into some of the companies and products in this space.
Caringo. This company was founded by the technology team that created the original content-addressable storage (CAS) architecture that became EMC's Centera. Caringo's CAStor is an object software solution that users load onto existing or commodity hardware to create an object-based storage cluster that scales to multiple petabytes. According to CEO Mark Goros, the company has focused on simplicity of design to provide the robustness and resilience needed to run in the absence of proprietary hardware, the factor that drives their economics. Although a file server module option is available CAStor keeps the file system separate from the storage architecture. Said Goros: "A well-designed object storage system like CAStor has no file system to pollute performance and scalability. It is high availability by design and automatically heals, balances, manages and migrates content. It stores metadata with the data object and can drive real-time remote replication automatically with rules based on that metadata." According to industry sources, CAStor is the OEM solution for Dell's DX Storage platform.
Cleversafe. Cleversafe provides an OSD system providing back-end storage for the cloud, with a focus, as the name implies, on security and data integrity. It cuts data sets into "slices," which it stores on different nodes without replication. Each Slicestor appliance, the purpose-built hardware Cleversafe manufactures to house these nodes, can hold up to 24 TB in a 2U chassis, with local CPU to maintain performance as the cluster scales. Each object slice is encrypted and protected by erasure coding, and an integrity check is run on the entire data set when slices are recombined. According to Director of Marketing Julie Bellanca, "Most products replicate to increase availability but end up increasing vulnerability at the same time. Our data dispersal technology improves security without making more copies, which has the additional benefit of reducing the amount of data handled and stored, keeping costs down."
Data Direct Networks (DDN). Web Object Scaler (WOS) from DDN is a cluster of purpose-built "cloud nodes," which can be configured for capacity or performance, providing up to 120 TB, or 12,000 IOPS per 4U node, respectively. Each node has internal self-healing data integrity features that, along with its performance and storage density numbers, are a good fit for the high-transaction environments -- high-capacity archives and cloud service providers -- that DDN is targeting. Currently, WOS supports an API interface only, requiring the end user or service provider to integrate the file system or protocol desired. According to Jeff Denworth, vice president of marketing, DDN has partnered with the cloud front-end provider Mezeo and will supply WOS for the back-end infrastructure of that company's cloud storage platform.
EMC. EMC's Atmos runs on a cluster of purpose-built x86 servers that can scale to 720 TB in a cabinet and tens of petabytes in the total cluster. It can also be run as a VM, attached to existing iSCSI, Fibre Channel or NFS storage. Both platform options run the identical software suite, which includes Web services access via REST and SOAP and file access via NFS and CIFS interfaces; the platforms are sold as elements of private and public cloud solutions. According to Leo Leung, senior manager of product marketing, "Using REST to provide the basics of storage interaction (like storing and retrieving data) is "table stakes" in this game. Users can control Atmos, not just do simple object manipulation, but control the entire system -- create users, define ACLs, do full system capacity utilization metrics -- all from APIs."
NEC. Hydrastor from NEC is a cluster of purpose-built Storage Nodes and Accelerator Nodes, which provide up to 12 TB of capacity, or 10,000 MBps of throughput, respectively, in combinations of up to 165 total nodes. Hydrastor's data optimized design, with WORM and encryption, is built to provide a long-term, resilient archive or Tier 2 repository for private or public cloud implementations. Hydrastor provides a CIFS or NFS interface for users and leverages global, inline deduplication and compression. According to Gideon Senderov, director of technical marketing, "This dedupe process is actually application-aware, because it differentiates the data from the metadata tags that applications add to the data stream and identifies the common blocks. Hydrastor also provides data resiliency through a user-configurable level of erasure coding, which enables the system to recover from the loss of multiple disk drives or even entire storage nodes."
NetApp. StorageGrid from NetApp runs in a clustered configuration on NetApp filers, providing a complete solution for large reference archives and cloud service providers. On the front end, StorageGrid supports NAS I/O (CIFS and NFS) and raw object access through HTTP and RESTful protocols. On the back end, it supports the policy-based placement, movement and retention of data objects for storage tiering, transparent hardware refresh and compliance. StorageGrid also does internal data integrity checks across geographically dispersed NetApp filers to minimize risk of data loss. As Ingo Fuchs, senior product marketing manager, said, "We're not new to the cloud storage game. We started out providing scalable, distributed storage (originally referred to as 'grid storage') to the medical industry, so we're used to large data sets, a diverse user base, regulatory requirements and critical data."
Object-based storage devices offer a unique set of advantages for cloud implementations -- both in the public cloud environments of storage and applications service providers and in the private clouds being set up by organizations to manage their own long-term reference data needs. Its ability to scale, economically, almost without limit and still maintain performance enables object-based storage systems to simplify the management of storage growth and keep costs down. Their ability to geographically disperse segments of a data set for security and reduced data redundancy, as well offering technologies like erasure coding for long-term data integrity, has helped make these solutions the choice for most cloud storage applications.
For VARs, cloud storage typically offers a couple of opportunities, one being reselling subscriptions for service providers. The more interesting may be as an integration solution. With a range of products available from software that runs on commodity hardware or VMs to dedicated, purpose-built storage clusters, there are vendor solutions available to match almost any customer situation.
About the author:
Eric Slack, a senior analyst for Storage Switzerland, has more than 20 years of experience in high-technology industries holding technical management and marketing/sales positions in the computer storage, instrumentation, digital imaging and test equipment fields. He's spent the past 15 years in the data storage field, with storage hardware manufacturers and as a national storage integrator, designing and implementing open systems storage solutions for companies in the Western United States. Find Storage Switzerland's disclosure statement here.
Object-based storage devices challenge file systems for unstructured data sets
Hitachi Data Systems, Caringo roll out object storage upgrades