Hadoop generates a wealth of consulting, support and implementation business for the channel, solution providers said.
The Apache Foundation's Hadoop distributed computing technology, which got off the ground about six years ago, focuses on the "big data" problem -- data sets so large and unwieldy that managing and analyzing them becomes difficult. To handle such vexing data-crunching tasks, Hadoop divvies up the processing workload across multiple compute nodes.
The Hadoop ecosystem and user base have expanded in recent years as different software distributions broadened its accessibility. Cloudera Inc. launched a Hadoop distribution in 2009, and Hortonworks Inc. and MapR Technologies Inc. debuted their distributions last year. EMC has since made a software licensing deal for the MapR distribution.
"There is a lot of business opportunity around the implementation, integration and support of big data infrastructure," said Clint Green, director of big data solutions at Iron Bow Technologies LLC, a Virginia-based solution provider.
These opportunities include consulting and professional services work, as well as developing curriculum to help customers learn Hadoop.
Joining the Hadoop ecosystem to solve big data challenges
Channel executives described the Hadoop ecosystem as one with numerous components. Hadoop consists of the Hadoop Distributed File System and MapReduce, software that distributes processing chores on a Hadoop cluster.
Apache also offers a number of Hadoop-related projects in addition to the core elements: the HBase database, Hive data warehouse, the Pig language for creating data analysis programs and the Accumulo data store. Commercial ISVs also have software for the Hadoop platform.
Part of the channel's job is to help customers sort out their options and craft offerings geared toward their mission objectives, Green said.
"We see Hadoop as an ecosystem and not just the distributed file system and
MapReduce software stack," Green said. "There are a number of solutions that are open source and closed source that work with, extend, and, in some cases, supersede portions of the Hadoop ecosystem."
The objective, he added, is to help customers take advantage of the right components.
"It's not a one-size-fits-all," he said.
Lunexa LLC, a boutique technology consulting firm in San Francisco, has tapped Hadoop to offer the Lunexa Web Analytics tool that brings together a pre-packaged data model, reporting functions and extract, transform and load (ETL) capabilities.
The tool employs Cloudera's distribution, including Apache Hadoop 3 (CDH3), but Lunexa is likely to soon begin using CDH4, said David Cole, a partner at the company. Website Analytics lets customers keep aggregate data and look-up tables on a relational database while housing raw data on Hadoop. It also includes a MicroStrategy reporting layer.
Lunexa was founded as a traditional ETL, business intelligence and data architecture consultancy, but the company began to get projects with large volumes of data and looked into Hadoop. It invested in Hadoop training for its consultants and rolled out an in-house Hadoop cluster. Customer uptake was slow at first but has gained traction in recent months, Cole said.
"There's a big education gap ... that is slowly starting to improve," he said. About 30% of Lunexa's Hadoop work is in proof-of-concept deployments, and 70% involves production installations, he added.
Hadoop ecosystem makes big data manageable
While other channel partners emphasize Hadoop software and related services, Hyve Solutions in Freemont, Calif., covers hardware as well. The solution provider sells to large-scale data center customers and provides a Hadoop appliance dubbed the bigD series 8. The product line has a three-node starter kit with 10.8 TB capacity, a full rack with 72 TB and individual 2U expansion nodes with 3.6 TB.
Hyve aims to build an ecosystem around Hadoop, said Steve Ichinaga, senior vice president and general manager. With that goal in mind, Hyve announced integration of Zettaset Inc.'s Hadoop management platform with its Hadoop appliance in May.
"The problem with [Hadoop] is there isn't enough meat on the bone in pure Hadoop in terms of functionality that makes it usable in an enterprise environment," said Jim Vogt, CEO of Zettaset.
Zettaset's platform covers Hadoop administration, enforces safety policies and provides automated monitoring, reporting, failover and performance tuning, among other functions. Hyve plans to continuing building out its Hadoop ecosystem "around these types of value-added partners," Ichinaga said.
Carahsoft Technology Corp., a government IT solutions provider, partners with Cloudera and has seen its Hadoop business expand rapidly. The company's channel -- Carahsoft offers Cloudera on its US General Services Administration schedule contract -- has generated $2.5 million in Cloudera sales since late last year, said Michael Shrader, vice president of innovative solutions at Carahsoft.
Since CDH is free from a licensing perspective, the revenue stems from subscription sales. A Cloudera Enterprise subscription includes support and Cloudera Manager, a management system for the Hadoop stack. Professional services and Hadoop training are also available.
"I have worked with close to 100 vendors over the years, and I haven't seen as much interest in a technology that is relatively new to the government market as we've seen with Cloudera," Shrader said.
Open source momentum is helping to prime demand for Hadoop-driven offerings, Shrader said. "Using open source technology to solve an emerging problem set seems to have piqued a lot of customer interest," he added.
About the author
John Moore is a Syracuse, N.Y.-based freelance writer, reachable at email@example.com.