Data integration is a process

Data quality issues are difficult to address -- so many companies don't bother. Find out how you can help address data quality and integration concerns.

IT channel takeaway: Get some steps for helping customers address data quality issues, which many companies tend to avoid addressing at all.

With Rick Sherman, founder of Athena IT Solutions, a Boston-based consulting firm that provides data warehouse and business intelligence consulting.

Question: You've written that a lot of IT groups seem to feel data quality problems are so overwhelming that they just give up. Is there an alternative?

Sherman: I see it all the time. I have a newer client that's a classic example: a health care provider that has a lot of affiliations with physician groups. They started to put together the data and got together a quick-and-dirty data mart. They immediately had issues with data quality, and they just kept talking about all the different reasons they couldn't do anything about it. So they didn't. Now they're wondering what they can do.

The things people need to do when they understand whatever it is the business wants to measure is to [think about] what aspects of the data quality they need to address. Off the bat, dealing with this data on a daily basis --dealing with spreadsheets and so forth -- they've probably grasped, at least from a high level, where they have information gaps they have to build into their data integration processes. In their workflow, I suggest that they have some way to measure what's going through and that they be able to monitor it -- even if they don't fix it. When you're pulling data from multiple sources, that's another area where you're not going to solve it overnight, but you need to start measuring it and tracking it right within your ETL [extract, transform and load] data integration process.

Question: Could you talk about some of the challenge of matching business metrics to source metrics?

Sherman: Most of the people who work on the operational side go with the reporting that the systems offer. Most of those metrics are tactical and those are ones where people need feedback that's as current as possible -- not real time, necessarily, but current data. On the other side we have the world of data warehouses, where people will look at historical data. They'll do a lot more trending and analysis. Usually, these [data warehouse systems] tend to be systems that extend across organizational units, so the issues of having consistent part numbers, employee numbers and so on present themselves. The operational side will feel they have the data, they just need more reports on it. They don't feel the crunch of preserving data integrity across various systems. It isn't until you get out of the stovepipe that you realize that there are broader integration issues. Also, these two worlds usually had their own set of tools, their own vendors. Now, with the modern applications, you can use the same tools on the data warehousing side that you can use on the application side — so you tend to get a blur, and that's confusing people. The bad news is that business users tend to view systems through the tools they have, and that conceals the iceberg of the data problems underneath.

Question: Will problems associated with regulatory compliance help sell the concept of data quality to upper management?

Sherman: Yes, because it relates [the problem] to business terms. Otherwise, things like referential integrity and conformed dimensions — I'm sorry, but they just don't cut it with any CFO that I know. People have just thrown up their hands about this for ages, but now they're being forced to do it. There are a number of situations where it's easy to explain why this is important to business.

This 3 Questions originally appeared in a weekly report from IT Business Edge.

Next Steps

Check out Pentaho's 6.1 data integration and business analytics platform 

Dig Deeper on Database software management

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.