How to automate database integration

This chapter excerpt follows a step-by-step approach for automating database integration and details how it is used to address challenges such as data changes. Go through the steps in this excerpt and learn best practices for continuous database integration.

Continuous Database Integration (CDBI) is the process of rebuilding your database and test data any time a change is applied to a project's version control repository.

Do you ever feel like your source code and database are operating in different "galaxies" throughout the development lifecycle on projects? As a developer, you may wait several days for a change to the database. You may even be restricted from making minor test data changes, or are afraid to make data changes for fear of ruining the one shared database for fellow developers. Situations like these are not unusual, and effectively utilizing CDBI can help alleviate some of these challenges and many others as well.

About the book:

This chapter excerpt on Continuous Database Integration (download PDF) is taken from the book Continuous Integration: Improving Software Quality and Reducing Risk. This purpose of this book is to change the perception of integration from a necessary evil into a normal part of the development process. The importance of continuous integration practices and techniques are explained.


Revisiting the theme of the book, database integration is one of the parts of the Integrate button (see Figure 5-1), because it is based on the principle that database code (DDL, DML, configuration files, etc.) is, in essence, no different from the rest of the source code in a system. In fact, the artifacts related to database integration:


  • Should reside in a version control system

  • Can be tested for rigor and inspected for policy compliance

  • And can be generated using your build scripts

Therefore, the building of the database can be incorporated into a CI system and can enjoy the same benefits as the rest of the project source code. What's more, changes to database source code can trigger an integration build just as other source code changes do.

FIGURE 5-1 Database integration in the Integrate button

Not All Data Sources Are Alike
Some projects, or portions of projects, don't use a database exactly the way that we define it in this chapter. However, most projects need to persist data, be it in a flat file, an XML file, a binary file, or an RDBMS. Regardless of your chosen persistent store, the principles of CDBI apply.

As a first step in describing how to automate database integration with CI, we start by describing how to incorporate database integration into a build process. The scripts used to build, configure, and populate a database need to be shared with the rest of the project team, so we discuss which database files are committed to a version control repository. Automating a database integration build process solves only part of the problem, so we go one step further by rebuilding the database and data at every software change—making the verification process continuous. If a team is adopting CDBI for the first time, most people on a project will probably need to modify their development practices, so we finish the chapter looking at effective CDBI practices.

Refactoring Databases
The topics covered in this chapter could even be the subject of a separate book. Other materials already make the case for treating your database as just another type of source code that is managed through the version control repository. This chapter gives you the essentials to automate and run database integration processes continuously.

Automate Database Integration

On many projects, a database administrator (DBA) can often feel like a short-order cook. DBAs typically have analytical skills that took many years to cultivate, but they often spend most of their time performing low-level command tasks. What's more, this job role can also be stressful, because the DBA often becomes a development bottleneck as the team members wait for the DBA to apply one small change to the database after another. Here's a familiar scenario.

Nona (Developer): Hi Julie, will you set up a development database for me on the shared development machine?

Julie (DBA): I am in the middle of something. I should be able to set it up later this afternoon. Would you like the data from last week or an export of today's data?

Nona: Today's data.

Julie: Okay, I can have that for you by tomorrow morning.

10 minutes later…

Scott (Technical Lead): I am unable to perform testing on the test server because there are no assigned records for the Reviewer role.

Julie: Oh, let me create some test records that are assigned this role. I think Nona may have used up all of those records.

Scott: Thanks. While you're at it, would you remove the Y/N constraint on the APPROVED columns on the PERSON table? We'd like to use different flags on this column.

It's more of the same on a typical day for the DBA. Not only is this a poor use of the DBA's talents, it causes a significant bottleneck, especially in the continuous approach promoted by CI. If you asked any DBA what they'd rather do on a day-to-day basis, they would probably tell you that they'd rather spend time on data normalization, improving performance, or developing and enforcing standards, not giving people database access or recreating databases and refreshing test data. In this section, you'll see how you can automate these repetitive tasks so both the DBA's and the team's time is spent on improving the efficacy and


TABLE 5-1 Repeatable Database Integration Activities


Activity  Description
Drop database
 Drop the database and remove the associated data so that you can create a new database with the same name.
Create database
Create a new database using Data Definition Language (DDL).
Insert system data
Insert any initial data (e.g., lookup tables) that your system is expected to contain when delivered.
Insert test data
Insert test data into multiple testing instances.
Migrate database and data
Migrate the database schema and data on a periodic basis (if you are creating a system based on an existing database).
Set up database instances in multiple environments
Establish separate databases to support different versions and environments.
Modify column attributes and constraints
Modify table column attributes and constraints based on requirements and refactoring.
Modify test data
Alter test data as needed for multiple environments.
Modify stored procedures(along with functions and triggers)
Modify and test your stored procedures many times during development (you typically need to do this if you are using stored procedures to provide behavior for your software).
Obtain access to different environments
Log in to different database environments using an ID, password, and database identifier(s).
Back up/restore large data sets
Create specialized functions for especially large data sets or entire databases.


efficiency of the database—not on simple administration. Table 5-1 identifies database integration activities typically performed by a project member that can be automated.

Once you have automated these database-related tasks, you'll find yourself solving problems just by dropping and creating a database followed by inserting test data. This chapter's examples utilize Ant, but the principles apply to any build platform that supports communicating with a database. If your build platform is NAnt, Rake, or Maven, you can do the same things this chapter demonstrates. Listing 5-1 executes a series of SQL statements to create a database including its related tables, comments, constraints, and stored procedures. The script also applies test data for the given environment, such as development or QA. Using this process, you can simply type ant db:prepare from the command line and the build process will perform the tasks outlined in Table 5-1. If you'd like to see this same process using other tools, like NAnt or Maven, we've provided additional examples at the book's associated Web site.

LISTING 5-1 build-database.xml: Automating Database Integration Using Ant


> ant –f build-database.xml db:prepare
Buildfile: build-database.xml
     [sql] Executing file: data-definition.sql
     [sql]8 of 8 SQL statements executed successfully
     [sql] Executing file: data-manipulation.sql
     [sql] 60 of 60 SQL statements executed successfully

Total time: 20 seconds

About the authors:

Paul Duvall is the CEO of Stelligent, a firm that helps clients create production-ready software every day. Duvall contributed a chapter to No Fluff, Just Stuff Anthology: The 2007 Edition.


Stephen M. Matyas III is the vice president of AutomateIt, a service branch of 5AM Solutions. Matyas is an expert in applied software engineering and Java.

Andrew Glover is the president of Stelligent Incorporated and contributes to online publications such as IBM's developerWorks and Oreilly's ONJava and ONLamp portals.

As you can see, using a single instruction from the command line enables the execution of SQL scripts that define (db:create) and manipulate a database (db:insert). We describe each of these tasks in more detail in subsequent sections.

Figure 5-2 shows the steps to automate your database integration. The following sections present a discussion of each component in Figure 5-2.

Creating Your Database
To automate database integration, you must first create a database. In this script, you typically drop and recreate the database, enforce data integrity through constraints and triggers, and define database behaviour

Figure 5-2 The sequence of automated database integration

through stored procedures or functions. We are using Ant to automate the execution of this process in Listing 5-2; however, as mentioned earlier, you can also use make, shell, batch, Rake, Ruby, or any number of tools. Notice that Ant provides a task to execute a SQL script via the sql task. Using a build platform like Ant allows you to perform the database integration activities using a sequential approach and enforce dependencies on other targets (a set of tasks) in the script. The example in Listing 5-2 demonstrates the use of Ant's sql attributes, such as driver, userid, and password, to connect to the database.

LISTING 5-2 build-database.xml: Defining Your Database Using an Ant Script

<target name="db:create" depends="filterSqlFiles" description="Create the database definition">
    delimiter="/ /"/>

Create Reusable Scripts
When you are writing a script that you plan to reuse, you can define the attributes in a single file so that you only need to define them one time for use in all of your manual and automated scripts, rather than every time you use these attributes. ____________________________________________________________________________

In Listing 5-3, data-definition.sql is the SQL script that's called by the Ant script in Listing 5-2. We're using a MySQL database in this example, so some of the commands are MySQL-dependent. The datadefinition. sql file is responsible for creating the database and its tables, enforcing data integrity, and applying stored procedures. The following is a typical order for this creation process.


  1. Database and permissions

  2. Tables

  3. Sequences


  5. Stored procedures and functions

  6. Triggers

The order of creation within your DDL statements may vary based on database object dependencies. For example, you may have a function that depends on a view, or vice versa, so you may need to list the view first, for example.

LISTING 5-3 data-definition.sql: Sample Database Definition Script for MySQL


GRANT ALL PRIVILEGES ON *.* TO 'brewery'@'%' IDENTIFIED BY 'brewery'
USE brewery//

brewer VARCHAR(50), date_received DATE);
CREATE TABLE state(state CHAR(2), description VARCHAR(50));//
    SELECT count(0) INTO count FROM beer;

Technically Speaking…

You may find it easier to organize your targets and scripts by database definition type (such as a table, view, and function) or by subsystem (e.g., Property and Application).

Manipulating Your Database
Once you've created a database from a build script, you'll need to provide initial data (e.g., lookup tables) and test data for testing code that relies on the database. This is where you supply the test data for your particular environment or testing context. What's more, you may also find yourself needing to use different SQL data files to support different environments, like development, test, QA, and production environments.

The example in Listing 5-4 shows an Ant script pointing to a SQL file, whose contents are inserted as test data into a database.

LISTING 5-4 build-database.xml: Manipulating Your Database Using an Ant Script

<target name="db:create" depends="filterSqlFiles" description="Create the database definition">

The SQL script in Listing 5-5 represents test data. This is the script that is referenced in Listing 5-4. In a typical script, you'll have many more records than the three shown in Listing 5-5. Our intent is to give you an idea of what the SQL scripts often execute. Tools like DbUnit and NDbUnit4 can help seed the data that is inserted into and deleted from a database as well.

LISTING 5-5 data-manipulation.sql: Sample Database Manipulation Script for MySQL

INSERT INTO beer(id, beer_name, brewer, date_received) VALUES (1,
'Liberty Ale','Anchor Brewing Company','2006-12-09');
INSERT INTO beer(id, beer_name, brewer, date_received) VALUES (2,
'Guinness Stout','St. James Gate Brewery','2006-10-23');
INSERT INTO state (state, description) VALUES('VT','Vermont');
INSERT INTO state (state, description) VALUES('VA','Virginia');
INSERT INTO state (state, description) VALUES('VI','Virgin Islands');

To achieve the benefits of automated database integration, you'll need to provide scripts for inserting, updating, and deleting data. These data manipulation scripts execute as part of an overall build process. Next, we discuss how to tie these scripts together with the orchestration script.

Creating a Build Database Orchestration Script

A database integration orchestration script executes the DDL and Data Manipulation Language (DML) statements. Listing 5-6 shows an Ant script that uses the sql task to call the data-definition.sql and datamanipulation. sql files we created in Listing 5-3 and Listing 5-5. You'll incorporate this orchestration into your higher-level build and integration processes.

LISTING 5-6 build-database.xml: Database Integration Orchestration Script Using Ant

<target name="db:prepare" depends="db:create, db:insert"/>
<target name="db:create">

<target name="db:insert" depends="filterSqlFiles">

Are You on Autopilot?

As you are automating your database integration, a few things may trip you up. It's easy for manual activities to unintentionally accumulate in your database integration process. Try to resist this. As Andrew Hunt and David Thomas mention in The Pragmatic Programmer : Don't Repeat Yourself (or DRY, for short), keep your build scripts "DRY." An easy form of "duplication" to miss is when we get acclimated to clicking through the database vendor's GUI application wizard rather than interfacing through the command line where it can run scripted. Another potential problem is the tendency to wait until there are many DDL/DML changes before committing back to the version control repository. Database changes can be pervasive, so try to make and check in small, incremental changes to your database; this will make it easier to test and debug.

Automated database integration
  How to automate database integration
  Using continuous database integration and a database sandbox

This chapter is an excerpt from the book, Continuous Integration: Improving Software Quality and Reducing Risk, authored by Paul Duvall, Steve Matyas, Andrew Glover. Published by Addison-Wesley Professional, June 2007. Copyright 2007. Pearson Education Inc., ISBN 0321336380. For an additional sample chapter, please visit Addison-Wesley. Safari Books Online subscribers can access the book here.

Dig Deeper on IT systems integrators