Using continuous database integration and a database 'sandbox'

There can be huge benefits to implementing automated and continuous database integration. See how a local database 'sandbox' can be useful when making database changes.

Use a Local Database Sandbox

About the book:

This chapter excerpt on Continuous Database Integration (download PDF) is taken from the book Continuous Integration: Improving Software Quality and Reducing Risk. This purpose of this book is to change the perception of integration from a necessary evil into a normal part of the development process. The importance of continuous integration practices and techniques are explained.


A significant challenge on many software development projects is making changes to the database structure. Many projects I've observed typically use one shared database, so when developers make changes to this shared development database they can adversely affect others on the team—causing each developer's private build to break (if their tests are part of the build). If developers have their own local code "sandbox" to isolate their coding changes from other developers, wouldn't it be great if they had a "database sandbox" too?

Multiple Database Instances
You may not have the resources to get a database for each developer. In this situation, you could assign each developer a separate schema on a central database server or use one of the freely available, lightweight, open source equivalent databases. Furthermore, many of the more widely used RDBMSs provide free developer versions.

Another important capability you gain by automating your database integration is that everyone on the team will be able to create a local instance of the database on their workstations. Every team member can then create a database "sandbox" to make and test database changes without affecting others. If your database integration is scripted, creating a new database instance is a push-button affair; conversely, if you don't automate your database integration, it is more difficult to recreate your database and run tests on your workstation. Figure 5-3 provides an illustration of each developer using a local database instance.

Using automated database integration, you are able to get the latest version of your database scripts along with your application source code. Each developer is able to create a local instance of the database, modify the version of the database on his workstation, test the changes, and commit the changes back to the repository. These changes will be integrated and tested with the rest of the software as part of the CI system. When another developer refreshes her private workspace with changes from the repository, the database changes are

Figure 5-3 Each developer uses a local database sandbox copied down to her workstation along with the other source code changes, and her next private build will incorporate the changes in her local database instance.


Supporting Multiple Database Environments
The next logical step after creating a local database sandbox is creating different database instances to support multiple database environments. For example, you may need to create a database that contains all of your migrated production data. Assuming there are many records in this database, you probably don't want to include it in your local development database. Usually, this will only be the DML (data changes), not the DDL (create, alter, and drop statements to the database). By automating your database integration, you can modify build script parameters to include the data to support these environments. This way, you can execute one command to provide data for different database environments. The same goes for versions. You may want to test new code against a prior version of the database. Use automated database integration to provide this capability with a "push of the Integrate button."

The next section identifies the reasons and approach for using a version control repository for database integration.

Use a Version Control Repository to Share Database Assets

Sharing your database integration scripts is a best practice, plain and simple. All software assets need to be in a version control repository, and this includes all database assets. Such assets might include the following:


  • DDL to drop and create tables and views, including constraints and triggers
  • Stored procedures and functions
  • Entity relationship diagrams
  • Test data for different environments
  • Specific database configurations

For numerous project scenarios, you should be able to recreate your entire database from "scratch" using the scripts in your version control repository (for large data sets, you may store data export scripts rather than row-by-row DML scripts). Once you've applied all your database assets to the version control repository, you'll have a history of all of the database changes, so you can run prior versions of the database with the latest code (or with prior versions of the code as well). This also reduces the gridlock on projects when all the developers need to go to the DBA for everything. Once database assets are in one place, you can make a change to a database column, perform a private build on your machine, commit it to the version control system, and know you will receive feedback after the integration build is run.

Sometimes during development the database will need to undergo large-scale changes. In most cases, these changes will require the expertise of several people on the team and a longer duration to complete. When such situations arise, it is best to create a task branch5 to commit the changes back into the version control repository rather than break the mainline and slow the activity of the rest of the team. Without CDBI, often the DBA will be making these large-scale database alterations, and he may be less suited to make all the changes at once to the database, dependent application source code, associated test code, and shared scripts because he may lack the knowledge of the source code that developers are writing.

Just as you have a consistent directory structure for your source code, you'll want to do the same for your database. Define the location of database assets—probably somewhere in the implementation/construction directory where your source code is located. In your database directory, define subdirectories for each of the database entity types and environments. Listing 5-7 shows a directory structure for an implementation directory (using a MySQL database).

Listing 5-7 Sample Implementation Directory


Just as with your source code, choose a directory structure that works well for you, one that clearly defines the entities while making it adaptable to changes.

Directory Structure and Script Maintenance
In the beginning, you may find that the directory structure is less important, but beware of making frequent directory structure changes, as you'll spend additional time updating your scripts to account for these changes.

Now that you've automated your database integration activities and are checking them into the version control repository to share with others on the team, let's make the process continuous so that it is run with every change to the software.


Continuous Database Integration

This is where the "rubber meets the road." The reason to automate, share, and build the database integration processes is so you can make these processes continuous. Using CDBI, your database and your source code are synchronized many times a day. Once you commit your database changes to your version control repository, the CI system proceeds like this: It gets a complete copy of the system source code, including your database data definition and manipulation scripts; recreates your database from the source; integrates your other source code; and then runs through your automated tests and inspections to ensure that the change(s) didn't introduce defects into your system's code base. Figure 5-4 demonstrates how the changes made by each developer are synchronized with the integration build based on the mainline in the version control repository.

Figure 5-4 shows that the changes that were made at 10 a.m. (by Mike) and the changes that were made at 10:15 AM (by Sandy) are included in the integration build that occurred at 10:30 AM. The integration build machine uses a single source point, provided by the version control repository, to synchronize and test changes as a part of the integration build.

FIGURE 5-4 Single source for database changes

Once you have automated your database integration and incorporated it into your build scripts, making it run continuously is simple. Your database integration tasks, along with the rest of your build, should be executed using one command (such as an Ant/NAnt target). To run your database integration tasks continuously, you only need to make sure these database integration build task commands are executed as a part of the automated build.

Give Developers the Capability to Modify the Database

Each developer should have the capability to modify any of the database scripts. This doesn't mean that every developer will modify these database scripts, because not every developer will have the necessary database expertise. Because each developer will have his own database sandbox, each can modify the local database and then commit the changes to the version control repository. This will reduce the DBA bottleneck and empower developers to make necessary changes. The DBA can evaluate the new changes to the repository by reviewing the integration builds or working with the developers if the build breaks.

As the adage goes, with this additional authority comes additional responsibility. Changes to the underlying database structure can have far-reaching impacts on the system. The developer who makes changes to the database structure must assume the responsibility for thorough testing before committing these changes. We feel it is far more likely in today's industry for a developer to have a knowledge of databases and database scripting—and the DBA is still there to "oversee" what changes, if any, move into the system.

The Team Focuses Together on Fixing Broken Builds

Since you treat the database the same as the other source code, you may experience broken builds because of a database error. Of course, errors may occur in any part of your build: source code, deployment, tests, inspections, as well as the database. When using CDBI, database integration is just another part of the build, so the playing field is leveled: Whatever breaks the build, the priority is to fix it. The payoff comes after this; the fix is now integrated, and that particular issue is prevented from recurring.

Make the DBA Part of the Development Team

Break down barriers and make members of your database team a part of the development team. You may already be doing this, but all too often there is a "wall" between the DBA and the software developers. As mentioned earlier, treat your database code and your other source code in the same manner. The same goes for the people on your team. This is probably the most controversial of the CDBI practices. We've worked on teams that have used CDBI with the DBA on the development team, and we've also seen the more traditional approach with the DBA on another team, the database team. CDBI worked in both environments, but it worked significantly better when the DBA was a part of the team.

Some people ask, "If the DBA is no longer dropping and recreating tables, creating test environments, and granting access, then what is she doing?" The simple answer is, "Now she can do her job!"— spending more time on higher-level tasks such as improving database performance, improving SQL performance, data normalization, and other value-added improvements.

Database Integration and the Integrate Button

The rest of this book covers topics concerning the additional parts of the Integrate button: continuous testing, inspection, deployment, and feedback. This section covers some specific issues concerning these practices when it comes to database integration.


Just as with source code, you'll want to test your database. We cover testing in detail in Chapter 6. There are tools you can use for database specific testing such as PL/Unit, OUnit for Oracle, and SQLUnit. Your database may contain behavior in stored procedures or functions that needs to be tested and executed as a part of the build script, just like the behavior of your other source code. You may also want to test the interactions of constraints, triggers, and transactional boundaries by performing application security data tests.


As with your other source code, you should be running inspections on your data source. This includes not just your DDL, but reference and testing data as well. There are tools you can incorporate and run in your automated build process so that you do not need to run these inspections manually. Here are a few ideas for inspections on your database.

  • Ensure efficient data performance by running set explain against your project's rules to target optimizations for your SQL queries.
  • Analyze data to ensure data integrity.
  • Use a SQL recorder tool to determine which queries are being run the most. These queries might be candidates for stored procedures.
  • Ensure adherence to data naming conventions and standards.


About the authors:

Paul Duvall is the CEO of Stelligent, a firm that helps clients create production-ready software every day. Duvall contributed a chapter to No Fluff, Just Stuff Anthology: The 2007 Edition.


Stephen M. Matyas III is the vice president of AutomateIt, a service branch of 5AM Solutions. Matyas is an expert in applied software engineering and Java.

Andrew Glover is the president of Stelligent Incorporated and contributes to online publications such as IBM's developerWorks and Oreilly's ONJava and ONLamp portals.

As we have indicated, the goal of CDBI is to treat your database source code and other source code in the same manner. The Continuous Deployment process will deploy your database to your development and test database instances just as it deploys your other code to its different environments (e.g., application servers). If you need to migrate from one database to another, you will be able to better test the migration process by running through the process on a continuous or scheduled basis.

Feedback and Documentation

When you incorporate continuous feedback and CDBI into your CI system, you will find out if your build failed because of the latest database changes. By default, most CI systems send the build status to the people who last applied changes to the version control repository. Just like with the source code, the CI system notifies those who made database changes quickly so that they can make the necessary fixes to the database.

Documentation is about communication, and there is much about the database you'll want to communicate to other project members or your customer. Your Entity Relationship Diagram (ERD) and data dictionary are excellent candidates for generating as a part of your continuous build process, perhaps as a secondary build (described in Chapter 4).


This chapter demonstrated that database assets are the same as other source code. Therefore, the same principles apply.

  • Automate your database integration using orchestrated build scripts that are run continuously, after any change to your database or its source code.
  • Ensure a single source for database assets by placing them in a version control repository.
  • Test and inspect your database scripts and code.
  • Change database development practices by ensuring that all database integration is managed through the build scripts, that all database assets are checked into version control, and that all developers (who interact with the database) have a database sandbox.

Table 5-2 summarizes the practices covered in this chapter.


Table 5-2 CI Practices Discussed in This Chapter


Practices  Description
Automate database integration Rebuild your database and insert test data as part of your automated build.
Use a local database sandbox All developers should have their own copy of the database that can be generated via SQL scripts. This can be on their workstations or even shared on a development server—as long as all developers have their own copy on this shared server.
Use a version controlrepository to share database assets Commit your DDL and DML scripts to your version control system so that other developers can run the same scripts to rebuild the database and test data.
Give developers the capability to modify the database Avoid the DBA bottleneck that occurs when database changes are restricted to just one or two people. Give developers the capability to modify the DDL and DML scripts and commit them to the version control repository.
Make the DBA part of the development team Be sure the DBA can run the same automated build—which includes a database rebuild that other developers run—to ensure consistency. By making the DBA a part of the development team, the shared experiences can benefit both the database and the development teams.


Let's see how Julie, Scott, and Nona are doing now that they're using CDBI.

Nona (Developer): I need to refresh my test data. What do I need to do?

Scott (Technical Lead): Just run ant db:refresh from the command line. Before you do that, get the latest changes out of Subversion by typing ant scm:update, because I made a few changes to the USER database table and the source code that uses this change.

Julie (DBA): Do you guys need any help?

Scott: Yeah, we are having a performance problem on one of the queries. Do you have time to look at it? Also, I think we need to denormalize the PRODUCT table. Can you model the table changes, prototype the DDL changes, and set up a code branch so Nona can modify her code for your changes? When you two are satisfied with the changes, merge the branch and commit it to Subversion so that they run as part of the integration build. Thanks, Julie.

Nona: . . . Sure, Scott. Should we use the test database rather than the development database?

Scott: Yeah, just run ant –Denvironment=test db:refresh. The developers and DBAs, who often perform roles that seem opposing or distant, are now continually working toward the same goal, and both are accomplishing more of their tasks that require analysis or design.



These questions can help you determine your level of automation and continuous database integration.

  • Are you capable of recreating your database from your automated build process? Can you rebuild your database at the "push of a button?"
  • Are the scripts (build and SQL) to your database integration automation committed to your version control repository?
  • Is everyone on your project capable of recreating the database using the automated build process?
  • During development, are you able to go back to prior versions of the database using your version control repository?
  • Is your database integration process continuous? Are your software code changes integrated and tested with the latest database whenever you apply those changes to the version control repository?
  • Are you running tests to verify the behavior of your database stored procedures and triggers?
  • Is your automated database integration process configurable? Are you able to modify the userid, password, unique database identifier, tablespace size, and so on using a single configuration file?

Automated database integration
  How to automate database integration
  Using continuous database integration and a database sandbox

This chapter is an excerpt from the book, Continuous Integration: Improving Software Quality and Reducing Risk, authored by Paul Duvall, Steve Matyas, Andrew Glover. Published by Addison-Wesley Professional, June 2007. Copyright 2007. Pearson Education Inc., ISBN 0321336380. For an additional sample chapter, please visit Addison-Wesley. Safari Books Online subscribers can access the book here.

Dig Deeper on IT systems integrators