Our Channel Explained series provides targeted articles that flesh out detail on channel terminology but avoid information overload. This week we examine the question, What is a federated database?
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
By Yuval Shavit, Features Writer
A federated database is a system in which several databases appear to function as a single entity. Each component database in the system is completely self-sustained and functional. When an application queries the federated database, the system figures out which of its component databases contains the data being requested and passes the request to it. Federated databases can be thought of as database virtualization in much the same way that storage virtualization makes several drives appear as one.
A federated database may be composed of a heterogeneous collection of databases, in which case it lets applications look at data in a more unified way without having to duplicate it across databases or make multiple queries and manually combine the results. If your customers are looking for this type of configuration, IBM Information Integration may be a good place to start.
In a homogeneous environment, federated databases can help distribute the load of very large databases (VLDBs). In this configuration, each component database has an identical schema but only a subset of the total rows. The federated database system distributes queries to the appropriate component database; the goal of the system is to ensure that a typical query will need to use only one component, thus drastically reducing the number of rows that need to be searched. Microsoft SQL Server has supported this type of database federation since its 2000 edition.
When a federated database is used for load distribution, rows are distributed to its components based on a primary key. Picking this key isn't trivial -- it can make the difference between a successful configuration and an unsuccessful one. Ideally, most or all queries should end up hitting only one component database.
More on database management
Managing data in the data hub
Oracle 11g: Top five tips
SQL Server security: Authentication
For instance, a bank may use a federated database in which transactions are split by year. Users will often only look at transactions in the past year and the system will only need to touch one or two component databases. On the other hand, splitting the databases by customer ID isn't likely to work well; a given set of transactions will involve a random distribution of customer IDs, meaning that the query will be sent out to many, or potentially all, of the component databases. This eliminates the benefit of the federated database -- nearly all of the rows end up being searched -- and will only increase the query's overall latency because of the query redirects.
Federated databases have several drawbacks, according to Hilary Cotter, a SQL Server consultant and Microsoft MVP. Each component database is a potential point of failure, and latency from any one server will delay the entire call. Your clients will have to program either the federated database or its calling applications to handle potentially incomplete query results, in case one or more of the component databases times out. They'll also have to manage each component database and keep it up to date, increasing maintenance costs.
In SQL Server 2005, table partitioning is often a good alternative to database federation. Partitions are similar to homogeneous federated databases in that one large database is broken into smaller parts based on a primary key, but table partitioning handles this by searching several segments of a single database instead of the entirety of several databases. Since this one server needs to handle the entire database, upgrading it is referred to as "scaling up," in contrast to the "scaling out" model of federated databases. Scaling up is often a better approach than scaling out, but it does have its limitations -- a server can only be beefed up so much -- and may require your customers to invest in comparatively expensive, high-end hardware instead of off-the-shelf servers.