If your customers are planning a data warehouse that measures in terabytes (TB), you better be thinking about the...
hardware and software necessary to handle such massive loads. They'll need a system that's scalable and reliable and that delivers high performance, regardless of the amounts of data and types of queries.
Yet a system of this scope can take months to deploy and can require significant resources just to get into production. But that doesn't have to be the case. You can sell customers a one-stop solution that incorporates the necessary hardware and software into a single, pre-configured package.
HP Enterprise Data Warehouse Appliance (EDW) is one such solution. It combines HP servers and other components with Microsoft software, including Windows Server 2008 and SQL Server Parallel Data Warehouse (PDW) Appliance Update 3.5 (AU3.5). If you're looking for a painless implementation in a short amount of time, the appliance could prove an ideal solution.
HP Enterprise Data Warehouse Appliance components
The HP data warehouse appliance comprises multiple HP servers configured with the software necessary to set up a comprehensive business intelligence (BI) platform. The servers are divided into two racks: the control rack and data rack. The control rack manages the various nodes, processes queries, stages data and offers a mechanism for backing up the warehouse. The data rack physically stores and manages the data.
The control rack manages all inbound queries, and it interfaces with the compute nodes on the data rack. The control rack includes the following four node types, in addition to a storage node:
- Control nodes: This is a high-availability cluster configured with active and passive nodes. The active node receives incoming queries, creates execution plans and instructs the compute nodes on how to execute the queries. The passive node kicks in if the active node fails.
- Management nodes: This is a high-availability cluster configured with active and passive nodes. The active node acts as an internal domain controller for all the appliance nodes and serves as a management interface. The passive node kicks in if the active node fails.
- Landing zone: This stores cleansed data before loading it into the compute nodes.
- Backup node: This node provides backup services for the data warehouse. It is connected to its own storage node.
The data rack in a standard HP Enterprise Data Warehouse Appliance deployment includes 10 active compute nodes and one passive node, as well as 10 storage nodes. Each compute node is configured with an instance of PDW. Each active compute node processes the assigned data and stores it in its own storage node.
The compute nodes in both the control rack and data rack are HP Proliant DL300-series servers -- enterprise-class machines built with HP-certified components. The storage nodes are HP P2000 G3 MSA arrays, which provide high-density storage with advance RAID management.
The software necessary to provide a comprehensive BI solution -- including housing the data warehouse and supporting extract, transform and load (ETL) operations -- is pre-installed and pre-configured, with all the hardware and software components optimized for PDW. Any software images used to configure the hardware components are first tuned, tested and validated by Microsoft and HP.
The EDW appliance is also fully integrated with the standard SQL Server BI tools, including Integration Services, Analysis Services and Reporting Services. In addition, the EDW appliance works with non-Microsoft BI systems such as Informatica, SAS and SAP Business Objects. The appliance also includes custom connectors for Apache Hadoop.
Scalability, reliability, performance
In any data center, there's one thing you can count on: Data will grow. And the HP data warehouse appliance can grow with the data. As noted earlier, a standard EDW appliance includes one data rack with 10 active compute nodes and one passive node, which would be a starting place if your customer has less than 150 TB of data. However, the customer can expand the system up to four data racks (and still have only one control rack), which equates to 40 SQL Server instances, each with its own data storage. With this much power, you can manage 600 TB of data in a single appliance.
It's also worth noting that HP offers an option for those starting out small (30 to 60 TB of data). Your customer can purchase the appliance with a half data rack containing only four active compute nodes, four storage nodes and one inactive compute node. As the customer's data grows, you can expand to a full data rack and then onto multiple data racks.
But scalability means nothing if the system is not reliable. The appliance has reliability measures not only through its enterprise-class hardware, but also through the many ways redundancy has been incorporated into the design. For example, the control and management nodes are active/passive clusters that ensure the system is always available to incoming connections, whether in the form of queries or requests from management programs needing access to the system. In addition, a spare compute mode can be brought online whenever it's needed, and the RAID arrays offer redundancy and reliability for all data storage. And there's a backup mechanism that ensures the data warehouse can be restored in the event of a system catastrophe.
The rack-base structure follows a hub-and-spoke architecture that uses massively parallel processing (MPP) technology to distribute queries evenly across multiple compute nodes. This approach balances all components to reduce bottlenecks and contention. In addition, a shared-nothing software architecture -- tightly coupled to the hardware -- ensures that each query can be executed simultaneously across the nodes. As a result, even queries against tables with trillions of rows can be returned in seconds.
Implementing the EDW appliance
The speed and ease with which you can implement the EDW appliance for a customer makes it an inviting option. Plus, purchasing a tested and optimized package reduces your implementation time and risks and helps to ensure a consistent and error-free deployment.
Of course, one of the issues with such a package is that a customer might outgrow the EDW's basic configuration at some point in the future. If data needs grow beyond 600 TB, performance requirements exceed the appliance's capabilities, or other unforeseen events warrant a different configuration, there's little you or your customer can do. What they buy is what they're stuck with. In addition, they must purchase Microsoft software licensing and support separately from the device, although HP does offer a collaborative support model that works closely with Microsoft. Plus, a solution such as the EDW appliance is worth considering only if you're talking about massive amounts of data. However, if you are, the appliance is well worth consideration.
About the author:
Robert Sheldon is a technical consultant and freelance technology writer.