My Adventures in Coding

August 14, 2015

Java – Automate database schema updates with Flyway

Filed under: Java — Brian @ 1:44 pm
Tags: , , , , , ,

I am currently working on a Java 8 project which is a REST API deployed as a WAR file to Jetty. Our deploy process is very simple, our deploy pipeline just copies the WAR file into the Jetty directory in each environment then verifies the app is up and running with the correct version and runs some integration tests.

We wanted to be able to apply database migration scripts automatically in each environment (Dev, Test, QA, Staging, Prod) as we did our deploy, so we would no longer have to worry about manually applying scripts. In the past for Java, Scala, and .NET projects I have used several different tools, but for this project we decided to use Flyway which is very flexible and simple to setup.

The documentation for Flyway is excellent, however I decided to just post what we did in our app in case it might help someone else out. Here is our “Quick Start” setup.

1. Add the Flyway dependency

Flyway can be setup using Maven, Gradle, SBT, Ant, etc. In our project we used Maven, so all we did was add the following to our pom.xml file:

<dependency>
    <groupId>org.flywaydb</groupId>
    <artifactId>flyway-core</artifactId>
    <version>3.2.1</version>
</dependency>

2. Code

Now, to get Flyway to check for any database changes to apply, whenever the application is started, put the following code somewhere in your applications startup sequence.

Flyway flyway = new Flyway();
//Create the dbo.schema_version table if it does not already exist
flyway.setBaselineOnMigrate(true);
//Where the method "getDataSource()" provides your DataSource 
//object that has the jdbc url, username, and password.
flyway.setDataSource(getDataSource()); 
flyway.migrate();

3. SQL Script Location

All migrations scripts by default must go in the following folder in your app:

/resources/db/migration

4. SQL Script Naming convention

Scripts are run in version number order based on their names, the default naming convention is:

Version__description.sql
For example: “V1_0_1__create_tables.sql”

All scripts must start with the letter “V”, followed by major/minor version numbers, with two underscores “__” separating the version from the description.

5. Schema Version Table

Flywaydb will automatically create a table in each database called “dbo.schema_version” which stores a log of all migration scripts that have been applied.

The table looks like:

version_rank installed_rank version description type script checksum installed_by installed_on execution_time success
1 1 1 Flyway Baseline BASELINE Flyway Baseline NULL JohnSmith 2015-07-30 15:55:49.337 0 1
1 2 1.0.1 create tables SQL V1_0_1__create_tables.sql -440358290 JohnSmith 2015-07-30 15:55:49.337 109 1

6. Handling Failure

If a script fails, the app will fail to start and the failure information is written to our log files as well as a monitor is tripped. Flywaydb applies each script as a transaction so all changes in the script will be rolled back if any part of the script fails. This is very handy because if you commit a script with some invalid SQL syntax, all you have to do is update the script with the corrected syntax, commit it again, and let the build pipeline apply the changes from the fixed script. No messy cleanup or reset to worry about.

So that is it, you should have all you need to get your database schema changes easily synced up with the deploy of your app!

February 15, 2010

Getting Started with Mongo DB

Filed under: MongoDB,NOSQL — Brian @ 4:28 pm
Tags: , , , ,

Why are we considering Mongo DB?

In our current system we receive data from our customers, store it in a relational database, and then when we make that data available via a REST service we change very little. The data we are storing is all related, each chunk is really a document. So lately we have been questioning why we go through so much work to break the data down into a schema we have created, to only put the data back into the same format when we use it. This extra work has made us wonder if a Document Store is a better fit for our needs. There is nothing wrong with a relational database, it is just that as the needs of data storage change, a one size fits all solution is not necessarily the best solution in every case.

To just give an example of some of the current problems we have been running into: constant changes in the data we have to store. Each time a customer requires that we store a new field, we must also modify our database schema to include this new field and update our application to map data received to this new field. For us, we really don’t want to define a schema, we just want to store the data given to us by our customers, and surface it as we received it. Trying to force the data into some sort of database schema we define based on what we “think” our customers “might” send us is time consuming and in this case just does not feel right.

Our team has been considering document style database solutions such as Couch DB for the last few months, but we still have not been ready to make the jump. Switching from a relational to a document style database is a big shift, not just in technology, but also in how we approach the design of our application. Moving to Couch DB is a more drastic shift than switching to Mongo DB (You can read about the differences here). Recently we have become more interested in Mongo DB and decided to spend some time exporing it.

A Blend of Document and Relational Databases

We do not want to have a predefined schema, but rather we want the database to just take what data we give it and store it. However, we are not quite ready to give up all of the flexibility in querying that is provided by a relational database. This is what peaked our interest in Mongo DB. What makes Mongo DB unique amongst document style databases is that it allows you to store full documents in collections (Think of a collection as a schema free Table) in JSON format, BUT it still gives you the ability to query over any fields in those documents and create indexes for fields often queried. This is what made us so interested in Mongo DB.

Alright, to the fun stuff! Let’s try out Mongo DB

  • Download the software: http://www.mongodb.org/display/DOCS/Downloads
  • Unzip the Mongo DB software to a folder such as: C:\mongodb-win32-x86_64-1.2.2
  • Create a folder to store database files:
    • The default folder is C:\data\db on windows and “/data/db” on unix systems.
    • Make sure to create this folder before starting Mongo DB.
  • Start Mongo DB
    • cd C:\mongodb-win32-x86_64-1.2.2\bin
    • mongod.exe
    • NOTE: If you get an error saying “Assertion: dpath (/data/db/) does not exist” then you have either not created the directory, or permissions have not been set appropriately.

Interactive Shell

Mongo DB provides a shell interface for querying the database directly. This is very useful when you want to try the database for the first time.

To start the Interactive shell:
cd C:\mongodb-win32-x86_64-1.2.2\bin
mongo.exe

Just to get started let’s try out a few basic statements

Insert a document into a new collection
Note that when we insert the document “car” we are also creating a brand new collection called “cars”. You can think of a collection as being like a table in a relational database, but without any defined schema. Also note that creation of collections is on demand, if you try to insert a document with a collection name that does not exist, a new collection will be create and the document will be inserted.

> car = { make: "Ford" , model: "Galaxy"};
     { "make" : "Ford", "model" : "Galaxy" }
> db.cars.save(car);
> db.cars.find();
     { "_id" : ObjectId("4b7789f2fb5c000000006faa"), "make" : "Ford", "model" : "Galaxy" }

Query for the document
In the above statement we inserted a new document and just did a “find” which returns ALL documents in the collection. But what if we want to query for a specific document? Mongo DB providers a function called “findOne” which returns a single document based on search criteria provided.

> db.cars.findOne({ make: "Ford" });
     {
             "_id" : ObjectId("4b7789f2fb5c000000006faa"),
             "make" : "Ford",
             "model" : "Galaxy"
     }

Update the existing document
In this update statement we are saying for every document in collection “cars” with a “model” type of “Galaxy”, set the “status” field to “InStock”. In an update if the “status” field already exists it will be updated, but also if the field does not exist on that document it will also be added.

> db.cars.update({ model: "Galaxy"}, {make: "Ford", model: "Galaxy", status: "InStock"});
> db.cars.findOne({ make: "Ford" });
     {
             "_id" : ObjectId("4b7789f2fb5c000000006faa"),
             "make" : "Ford",
             "model" : "Galaxy",
             "status" : "InStock"
     }

Query for a list of documents
When you use the “find” command you can specify criteria to search by (which feels very much like a select statement in a relational database). The find command returns a cursor which allows you to call functions like “next()” and “hasNext()” to retrieve documents. The interesting thing is that the cursor is not a snapshot (a list of all documents that meet the search criteria at that moment in time), but instead it does a live query for the next item each time “next()” is called. So if you create a cursor for all “cars” matching the criteria of “make” being “Ford” and then as your loop is running, new documents matching this criteria are added to the “cars” collection, they will also be included. This feature is especially interesting to us because in our current system we provide data feeds. To be able to start the data transfer and know that any new documents added while the job is running, will not be missed, is a big plus.

> var cursor = db.cars.find({ make: "Ford" });
> cursor.length()
     1
> car = { make: "Ford" , model: "Fairlane"};
     { "make" : "Ford", "model" : "Fairlane" }
> db.cars.save(car);
> cursor.length()
     2
> db.cars.find({ make: "Ford" });
     { "_id" : ObjectId("4b7789f2fb5c000000006faa"), "make" : "Ford", "model" : "Galaxy", "status" : "InStock" }
     { "_id" : ObjectId("4b79c916fb5c000000006fab"), "make" : "Ford", "model" : "Fairlane" }

Import JSON data from a file

Also, if you have data in valid JSON format in a file and would like to import this data, you can use the import from file utility:

mongoimport.exe –file [PathToMyFile] –collection [NameOfNewCollection]

For our initial test of Mongo DB we created a test file with 200,000 JSON records of real production data and loaded it into Mongo DB with this file import utitlity. The performance was very good, it did the import in about 1000 records per second.

Conclusion

As with any technology change, it is important to think about the problem you are trying to solve and pick the right tool for the job. Avoid looking at cool technologies then trying to apply them to every problem as some sort of silver bullet solution, which is all too common in our industry. MongoDB is not a one size fits all solution just as a relation database is not a one size fits all solution. For most of our applications a relational database is still the best choice, but we found in this case a document store was more appropriate, which lead us to start exploring a number of document store solutions, and out of that evaluation MongoDB was the best fit for our needs. So evaluate MongoDB and see if it is the right tool to solve your problem, or just use it because MongoDB is Web Scale (haha).

If you are interested in learning more about Mongo DB and especially in how the query syntax works, check out the tutorial provided on the Mongo DB website.

Blog at WordPress.com.