IT channel takeaway: A real-time ETL vendor touts the benefits of investing in real-time ETL and offers best practices for handling data from multiple sources, if you decide this type of technology is one you want to employ for business intelligence solutions and services.
With Jennifer St. Louis, senior product marketing manager at DataMirror, a vendor of real-time ETL solutions.
Question: Could you briefly make the business case for real-time ETL?
St. Louis: We see two reasons why companies are looking for real-time ETL. One is to support real-time or "operational" business intelligence. [An example is] folks like AmerisourceBergen, who organized the staff in their warehouse based on how many shipments are going out that day. Day-old data isn't going to solve their problem. It's the ability to react to business intelligence as it's occurring -- to events that happened early in the morning -- and be able to plan for them throughout the day, for customer service or to have the appropriate inventory allocated.
More on BI and ETL
Oracle integration tools for real-time data
Change data capture in SQL Server 2008
ETL tools' handling of complicated business logic
The other reason people are looking for real-time ETL is that the volumes of data are growing, and a lot of their production applications are running at capacity. So the idea that they can slow their applications down for an eight-hour batch window is no longer a reality, because they need to be available for either Web traffic or global operations 24/7. Using real-time ETL, as transactions happen in the source applications, they can immediately be replicated to the target system without slowing down the application, meaning these applications don't require a batch window anymore.
Question: Doesn't real-time ETL impose some kind of overhead burden, either on the operational system or the network? And if so, how heavy is that burden? I think that's a question IT managers would ask right away.
St. Louis: Part of the problem is the way real-time ETL is described in the market. A lot of vendors say they do it, and there are really a lot of different approaches. DataMirror does it using log-based changed-data capture. So we don't hit the database directly. We're not querying databases like traditional ETL or EII tools on the market, so we're not having that query overhead on the database. And we're only replicating the minimal amount of data, again having minimal impact on the network.
Question: ETL, of course, stands for "extract, transform and load." The question I have is about the "transform." How do you approach the problem of dealing with data input from multiple sources?
St. Louis: Certainly, we have the capacity to consolidate data from multiple sources. For full-on data warehousing projects, many of our customers use us to do the real-time capture of data out of the transaction systems and for the basic transformation capabilities — changing columns, simple consolidations, summaries, aggregations. If they require more complex summaries or aggregations from multiple systems, a lot of companies are complementing our solutions with a traditional ETL solution.
This 3 Questions originally appeared in a weekly report from IT Business Edge.