For large unstructured data sets, common in applications like medical imaging, movies and entertainment and more recently, cloud storage, traditional file system (NAS) architectures aren't always the best option. File systems by nature involve a hierarchical structure that can limit their overall scalability and impact performance as they grow.
OSDs, which are analogous to logical units, group data into objects that the user or application determines are related. They also allocate space for these objects and manage lower-level space and security functions. Objects are assigned an Object ID (OID) number -- rather than an inode for file systems -- with which they're accessed. This results in data being "organized" as a flat collection of unique OIDs, instead of a hierarchical collection of directories, folders and file names.
Users and applications must interface with objects in an OSD through an API or via HTTP, instead of NAS protocols or iSCSI. Public cloud service providers typically implement these APIs to give users file or block access to OSD storage on the back end. For private clouds and archives, users either handle the APIs themselves or use an OSD storage solution that includes this front-end functionality. Another alternative is to implement a cloud gateway, which provides the appropriate protocols for user interface to back-end object storage.
In addition to a finite number of files, most file systems also have restrictions on the number of directories and levels of hierarchy they support. This limits the absolute amount of data they can store. Not bound by this hierarchical structure, object storage can grow the number of object IDs they contain almost without limit. Each object is a self-contained unit that includes the OID, the object's metadata, its attributes and the data itself. The metadata includes information like creation date, ownership, size, etc. And attributes include information used to manage storage allocation (like extents) as well as creation dates, ownership and so forth. But objects also contain attributes that are supplied by users or applications and can be used to store information on characteristics such as performance, availability and capacity. With this higher-level information, OSDs can provide additional functionality, such as quality of service (QoS), power usage, and improved security and reliability.
Similar to the way products like Google Desktop simplify accessing a file on your PC, object storage systems, through the use of OIDs, simplify access to any piece of data, without requiring you to know which physical storage device, file system or directory it's in. This abstraction enables OSDs to work very well with storage hardware configured in a distributed, or "node," architecture, where processing power can scale in conjunction with storage capacity. OSDs can enable the addition of storage to the back end of the infrastructure as needed but remain transparent to users on the front end. In this node configuration, OSDs can allow users at any node to access objects that physically reside on any other node, without having requests go through a central controller. This enables a true "global" storage system as very large amounts of data can be managed by objects and physically stored anywhere they can be accessed via a WAN or the Internet.
Distributing copies of data across storage nodes can provide a measure of data protection; distributing those nodes geographically can add a DR capability. OSD systems can also provide data resiliency with erasure coding. This is a computational process somewhat similar to RAID parity, which parses a data set into sub-blocks, or "chunks," adding a percentage of redundant chunks, depending upon the desired level of protection. Erasure coding helps maintain data integrity, restoring lost or corrupted data using these redundant chunks, much like RAID does at the disk drive level. But, compared with RAID and traditional replication, erasure coding can be more efficient and more robust and is being used by some OSD vendors to replace RAID protection altogether within objects. This can reduce the storage capacity and processing overhead required.
Since many OSD systems create multiple copies of most data sets to provide protection and data integrity, optimization can be an important technology to keep capacity requirements down. Object-based storage architectures lend themselves to forms of capacity optimization like deduplication within individual nodes, and some do a form of "global deduplication" across nodes. Data reduction can also help optimize bandwidth between nodes that are geographically dispersed.
OSD for cloud storage
These attributes make object storage a good choice for cloud storage and, indeed, cloud storage use cases constitute a significant number of implementations. Compared with a traditional file system or block storage architecture, OSDs have several advantages for the cloud, such as extreme scalability (capacity and performance) and functionality that supports data protection/security/integrity and DR. The self-contained nature of objects and their use of extensive attributes can also support data sharing features to enable user collaboration, even in geographically dispersed locations.
There are a number of object-based storage vendors, some that provide just the back-end storage and some the front-end storage services and UI as well. In our next article, we'll list some of these vendors and discuss how they implement the object-based storage model in the cloud.
About the author
Eric Slack, a senior analyst for Storage Switzerland, has more than 20 years of experience in high-technology industries holding technical management and marketing/sales positions in the computer storage, instrumentation, digital imaging and test equipment fields. He's spent the past 15 years in the data storage field, with storage hardware manufacturers and as a national storage integrator, designing and implementing open systems storage solutions for companies in the Western United States. Find Storage Switzerland's disclosure statement here.
This was first published in September 2010