Java – Using Flyway to Manage SQL Migration Scripts

In our Java applications we started using Flyway about five years ago to automate running SQL migration scripts. Overall Flyway has worked very well in simplifying our deployment process.

Now, when a developer has a code change that also involves a database change, they simply add a migration script in the same commit. Then our CI builds a WAR file and it goes through our CI pipeline gauntlet of tests.

If you are new to Flyway, the following is a simple Java example using SQL Server to help you get started.

You can download the code from GitHub here.

Prerequisites

If you already have SQL server, great! However, if you do not you can download and install SQL Express.

If you have issues connecting the application to SQL Server please checkout my post SQL Server – Common Setup Issues

Create Database and User

For this example we use the database Examples and user client. You can use the following script to create the required database and user:

USE master;
GO
CREATE LOGIN client WITH PASSWORD = 'client';
GO
CREATE DATABASE Examples;
GO
USE Examples
GO
create user client for login client;
GO
Grant ALL to client
GO
EXEC sp_addrolemember 'db_owner', 'client'
GO

Migration Scripts

Flyway by default looks for SQL migration scripts in the folder:

  • /src/main/resources/db/migration

Migration scripts must use the following naming convention:

  • V{Version_Number}__{description}.sql
  • Note: The version and description are separated by two underscores

For our example we will create the following script and place it in the migration folder:

  • V1_0_0_1__create_tables.sql
CREATE TABLE [dbo].[Person] (
    [id]               [bigint] IDENTITY(1,1)   NOT NULL,
    [firstName]        [nvarchar](100)          NOT NULL,
    [lastName]         [nvarchar](100)          NOT NULL,
    [createdDateUtc]   [datetime]               NOT NULL,
 CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED (
    [id] ASC
  ),
 CONSTRAINT [UC_Thermostat_Name] UNIQUE NONCLUSTERED (
    [firstName] ASC,
    [lastName] ASC
  )
)
GO

Flyway Example

The way we configure all of our Java applications is to always run Flyway schema migrations BEFORE we allow the application to start up. If there are any errors with the migration scripts, we stop the application and shut it down.

If you are using Maven, here are the dependencies and versions used for this example:

    <dependencies>
        <dependency>
            <groupId>org.flywaydb</groupId>
            <artifactId>flyway-core</artifactId>
            <version>6.4.4</version>
        </dependency>
        <dependency>
            <groupId>com.microsoft.sqlserver</groupId>
            <artifactId>mssql-jdbc</artifactId>
            <version>8.2.2.jre8</version>
        </dependency>
        <dependency>
            <groupId>com.zaxxer</groupId>
            <artifactId>HikariCP</artifactId>
            <version>3.4.5</version>
        </dependency>
    </dependencies>

The following example code shows you how to run Flyway from a Java application:

import com.zaxxer.hikari.HikariDataSource;
import org.flywaydb.core.Flyway;
import org.flywaydb.core.api.FlywayException;
import org.flywaydb.core.api.configuration.FluentConfiguration;

public class FlywayExample {

    private final static String JDBC_URL = "jdbc:sqlserver://localhost;instanceName=SQLEXPRESS;DatabaseName=Examples";
    private final static String DRIVER = "com.microsoft.sqlserver.jdbc.SQLServerDriver";
    private final static String USERNAME = "client";
    private final static String PASSWORD = "client";

    public static void main(String[] args) {

        // Create our DataSource
        try (HikariDataSource dataSource = new HikariDataSource()) {
            dataSource.setJdbcUrl(JDBC_URL);
            dataSource.setUsername(USERNAME);
            dataSource.setPassword(PASSWORD);
            dataSource.setDriverClassName(DRIVER);

            // Create a Flyway configuration
            FluentConfiguration configure = org.flywaydb.core.Flyway.configure();
            configure.dataSource(dataSource);
            configure.baselineOnMigrate(true);  // Used when applying Flyway to an existing database
            configure.table("schema_version");  // Name of schema version tracking table

            // Run migration
            Flyway flyway = new Flyway(configure);
            System.out.println("Running database migrations ...");
            flyway.repair();            // Repair checksums, history, failed migrations, etc.
            flyway.migrate();           // Apply any scripts not in the schema version table
            System.out.println("Done");
        } catch (FlywayException e) {
            e.printStackTrace();
        }
    }
}

If you are using Maven from the command line you can run the following commands:

  • mvn compile
  • mvn exec:java -Dexec.mainClass=”FlywayExample”

If everything is successful you should see the following output:

[INFO] --- exec-maven-plugin:3.0.0:java (default-cli) @ FlywayExample ---
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Running database migrations ...
Done
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  1.927 s
[INFO] Finished at: 2020-06-13T18:18:35-06:00
[INFO] ------------------------------------------------------------------------

Verify Migration

After Flyway has run you should have two new tables:

  • dbo.Person – Created by our migration script
  • dbo.schema_version – Created by Flyway to track schema version

You can view the result of the Flyway migration by checking the schema_version table:

SELECT [version], [description], [type], [script] 
FROM dbo.schema_version;

In the schema version table you should see that our migration script was run:

version	  description	 type	script
-----------------------------------------------------------
1.0.0.1	  create tables	 SQL	V1_0_0_1__create_tables.sql

That is all! I hope that helps!

References

Tutorial: Java-based Migrations

SQL Server – How to Partition a Table by Date

There are many excellent posts available that will explain in detail how partitions work in SQL Server. However, for this post I just wanted to help anyone new to using SQL Server Partitioned tables to get a working end to end example up and running quickly.

Note: At the time of the post I was using SQL Server 2016 Enterprise Edition

You can download the scripts from GitHub here;

Overview

Why Partition

There are several reasons why we moved some of our tables to being partitioned:

  • Performance – Querying one or two small partitions is often significantly cheaper than querying one large table.
  • Indexing – If data is only inserted into the newest partition and never old ones, indexes on old partitions will never fragment.
  • Management – If you are only keeping data for a set amount of time, truncating of old partitions is much more efficient than running delete statements.

Selecting a Partition Size

You want to make sure you select a partition size that makes sense, both for the volume of data you will be storing but also for how the data will be queried.

For example, in our use case we partitioned our data by day because the vast majority of queries were only for the last day or two. So the majority of queries only needed to retrieve data from at most two partitions.

However, if most of your queries are looking at longer date ranges, it means data from more partitions would need to be joined making it more appropriate to choose a larger partition size.

Creating a Partitioned Table

Create the Tables

For this example we are going to have two tables:

  • Thermostat – Stores the identifier for each thermostat we are tracking.
  • TemperatureHistory – Stores a record for every time we receive a temperature reading from a thermostat. This table will be partitioned by day.

Create Thermostat table:

CREATE TABLE [dbo].[Thermostat] (
	[thermostatId]     [bigint] IDENTITY(1,1)	NOT NULL,
	[createdDateUtc]   [datetime]				NOT NULL,
	[name]             [nvarchar](100)			NOT NULL,
 CONSTRAINT [PK_Thermostat] PRIMARY KEY CLUSTERED (
	[thermostatId] ASC
	),
 CONSTRAINT [UC_Thermostat_Name] UNIQUE NONCLUSTERED (
	[name] ASC
	) 
) 
GO

Create the TemperatureHistory table:

CREATE TABLE [dbo].[TemperatureHistory] (
    [thermostatId]   [bigint]   NOT NULL,
    [timestamp]      [datetime] NOT NULL,
    [temperature]    [float]    NOT NULL
)
GO

Note: Notice how we have not created a primary key on the TemperatureHistory table. That will come later since it is in the creation of the primary key where we will add partitioning.

Add foreign key for ThermostatId from TemperatureHistory to Thermostat table:

ALTER TABLE [dbo].[TemperatureHistory]  
	WITH CHECK ADD  CONSTRAINT [FK_TemperatureHistory_Thermostat_thermostatId] 
	FOREIGN KEY([thermostatId])
REFERENCES [dbo].[Thermostat] ([thermostatId])
GO

ALTER TABLE [dbo].[TemperatureHistory] 
CHECK CONSTRAINT [FK_TemperatureHistory_Thermostat_thermostatId]
GO

Create a Partition Function

In the create statement below you will notice the option RANGE RIGHT. There are actually two options for RANGE and here are what they mean:

  • LEFT – the date on the partition is the end date. This means all data inserted into the partition will be before the partition date but after the end date of the previous partition.
  • RIGHT – the date on the partition is the start date. This means all data inserted into the partition will be after the partition date but before the start date of the next partition.

For a table partitioned by date I personally prefer to use RIGHT since if we run out of partitions, all data from that point on will just be inserted into the last partition until we add more partitions. But it all depends on what works best for your use case.

CREATE PARTITION FUNCTION [DailyPartitionFunction](datetime) AS RANGE RIGHT 
FOR VALUES (N'2020-06-01T00:00:00.000', N'2020-06-02T00:00:00.000', N'2020-06-03T00:00:00.000', 
			N'2020-06-04T00:00:00.000', N'2020-06-05T00:00:00.000', N'2020-06-06T00:00:00.000', 
			N'2020-06-07T00:00:00.000', N'2020-06-08T00:00:00.000', N'2020-06-09T00:00:00.000', 
			N'2020-06-10T00:00:00.000')
GO

Create a Partition Scheme

CREATE PARTITION SCHEME DailyPartitionScheme
AS PARTITION DailyPartitionFunction ALL TO ([PRIMARY])
GO

Create a Partitioned Primary Key

The final step is to create a primary key on table TemperatureHistory using our new partition scheme:

ALTER TABLE [dbo].[TemperatureHistory]
    ADD CONSTRAINT [PK_TemperatureHistory_Partition_thermostatId_timestamp]
PRIMARY KEY CLUSTERED (
    [thermostatId] ASC,
    [timestamp] ASC
    )
    WITH (
		ONLINE = OFF -- This can be ONLINE = ON if using SQL Server Enterprise Edition 
    )
    ON DailyPartitionScheme([timestamp]);
GO

We now have a table, TemperatureHistory, which is partitioned by day.

Insert some Test Data

Add a thermostat:

INSERT INTO [dbo].[Thermostat] ([createdDateUtc], [name])
VALUES('2020-06-01', 'ABC-455-Lobby')
GO

Add some temperature history:

INSERT INTO [dbo].[TemperatureHistory]([thermostatId],[timestamp],[temperature])
VALUES(1,'2020-06-01 10:20:00.000',23.7)
GO
INSERT INTO [dbo].[TemperatureHistory]([thermostatId],[timestamp],[temperature])
VALUES(1,'2020-06-03 08:28:00.000',22.4)
GO
INSERT INTO [dbo].[TemperatureHistory]([thermostatId],[timestamp],[temperature])
VALUES(1,'2020-06-07 01:10:00.000',22.8)
GO
INSERT INTO [dbo].[TemperatureHistory]([thermostatId],[timestamp],[temperature])
VALUES(1,'2020-06-09 07:34:00.000',24.7)
GO

So now if you do a select from table Thermostat you should see:

SELECT * FROM [dbo].[Thermostat]
thermostatId    createdDateUtc            name
-------------------------------------------------------
1               2020-06-01 00:00:00.000   ABC-455-Lobby

Then if you do a select from our partitioned table TemperatureHistory you should see:

SELECT * FROM [dbo].[TemperatureHistory]
WHERE thermostatId = 1 
AND [timestamp] BETWEEN '2020-06-01' AND '2020-06-10'
thermostatId    timestamp                  temperature
-------------------------------------------------------
1               2020-06-01 10:20:00.000    23.7
1               2020-06-03 08:28:00.000    22.4
1               2020-06-07 01:10:00.000    22.8
1               2020-06-09 07:34:00.000    24.7

Querying a Partitioned Table

One of the big benefits to a partitioned table is you only need to query the partitions that contain the data you are looking for. So when it comes to a table partitioned by date it is import to ALWAYS specify a range with a start date and end date for every query. If you fail to do this you may end up with queries that read from more partitions than necessary which will impact performance.

For example the following query should only read from a single partition:

SELECT * FROM [dbo].[TemperatureHistory]
WHERE thermostatId = 1 
AND [timestamp] BETWEEN '2020-06-03 00:00:00.000' AND '2020-06-03 23:59:59.000'

In SSMS (SQL Server Management Studio) you can use the Include Actual Execution Plan option to give you more insight into what your query is doing.

With execution plan enabled, execute the query and you will see it returns one row as expected:

Now, select the Execution Plan tab, then right click on Clustered Index Seek and select Properties.

The properties window will give you some useful information about the performance of your query on the partitioned table, but requires some explanation.

  • Actual Partition Count – Is the count of the total number of partitions that were read.
  • Actual Partitions Accessed – Is the number of the partition from which the data was read. We have 10 partitions, the row returned came from partition 4.

Now we have confirmed our query is only reading from a single partition!

That is all! I hope that helps!

References

SQL Server – Get Row Counts and Space Used of All Tables and Partitions in a Database

In our database we use Schemas to group subsets of tables. We also have several tables that use Partitions to partition large tables by date. We wanted a query that would show us for each table it’s name, schema, the total number of rows, the size in MB, and partition name if the table has one or more partitions. For tables with partitions we also wanted the row count and size for each partition in a table. The following is what we came up with.

SELECT sizes.[schema], sizes.[tableName], sizes.[partition], sizes.[rowCount], sizes.[totalSpaceMB] 
FROM (
	SELECT sch.[name] AS [schema], 
		t.[NAME] AS [tableName], 
		p.[rows] AS [rowCount], 
		((SUM(a.total_pages) * 8) / 1024) AS [totalSpaceMB], 
		CONVERT(date, prv.[value]) AS [partition]
	FROM sys.tables t WITH(NOLOCK)
	INNER JOIN sys.indexes i WITH(NOLOCK) ON t.OBJECT_ID = i.object_id
	INNER JOIN sys.partitions p WITH(NOLOCK) ON i.object_id = p.OBJECT_ID 
		AND i.index_id = p.index_id
	INNER JOIN sys.allocation_units a WITH(NOLOCK) ON p.partition_id = a.container_id
	INNER JOIN sys.schemas sch WITH(NOLOCK) ON t.schema_id = sch.schema_id
	LEFT OUTER JOIN sys.partition_schemes ps WITH(NOLOCK) ON ps.data_space_id = i.data_space_id
	LEFT OUTER JOIN sys.partition_range_values prv WITH(NOLOCK) ON ps.function_id = prv.function_id 
		AND p.partition_number = prv.boundary_id
	WHERE t.is_ms_shipped = 0 AND i.OBJECT_ID > 255
	AND (ps.[name] IS NULL OR (prv.[value] IS NOT NULL AND p.[rows] > 0))
	GROUP BY t.[Name], sch.[name], p.[Rows], prv.[value]
) sizes
ORDER BY sizes.[schema] asc, sizes.[tableName] asc, sizes.[partition] desc

Note: The above query filters out tables and partitions that have 0 rows.

Here is some example output of the query:

schema		tableName			partition	rowCount	totalSpaceMB	
---------------------------------------------------------------------------------------------
Account		Account				NULL		32445		26
Account		AccountType			NULL		84		1
Account		ContactInfo			NULL		45433		53
Account		Customer			NULL		268767		87
Shipment	Location			NULL		1003		12
Shipment	LocationType	                NULL		11		1
Shipment	Product				NULL		23132		123		
Tracking	LocationHistory		        2020-01-04	344543		323
Tracking	LocationHistory		        2020-01-03	254354		213
Tracking	LocationHistory		        2020-01-02	65446		67
Tracking	LocationHistory		        2020-01-01	98656		98

To sort by row count you can change the order by clause to:

ORDER BY sizes.[rowCount] desc

Or to sort by table size you can change the order by clause to:

ORDER BY sizes.[totalSpaceMB]  desc

I hope that helps!

SQL Server – How to write an Upsert using MERGE or ROWCOUNT

In a previous post titled “How to write an Upsert using MERGE” I showed a simple example of how to write an upsert in SQL Server using the MERGE command. However, in some use cases for performance reasons instead of using MERGE I use ROWCOUNT. This post provides some simple examples of how and when to use each option.

What is an Upsert

An upsert is a query that inserts a row if it does not exist or updates a row if it does exist.

Why an Upsert

When we write to a table, we need to first check if the row exists, before we know if the query should be an insert or an update. The upsert allows us to make a single call to the database, that checks if the row exists followed by an insert or update, rather than two calls to the database, one to check if the rows exists and a second to perform an insert or update.

Example Table

Here is a table you can use to run the following examples.

CREATE TABLE dbo.ClientData(
    ClientId [bigint] NOT NULL,
    [Data] [varchar](20) NOT NULL,
    UpdatedDateUtc [datetime] NOT NULL DEFAULT (getutcdate()),
 CONSTRAINT [PK_ClientData_ClientId] PRIMARY KEY CLUSTERED (
    ClientId ASC
)) ON [PRIMARY]
GO

MERGE

Use this approach when you have more INSERTS than UPDATES or roughly an equal number of INSERTs and UPDATEs. In this case either way you need to check if the row exists.

Example

MERGE dbo.ClientData AS [Target]
USING (SELECT 12345 AS clientId) AS [Source] 
ON [Target].clientId = [Source].clientId
WHEN MATCHED THEN  
  UPDATE SET [Target].data='Update', [Target].updatedDateUtc = GetUtcDate()
WHEN NOT MATCHED THEN  
  INSERT (clientId, data) VALUES ([Source].clientId, 'Insert');

Run the query once and you will see that the “Data” field says “Insert”:

SELECT * FROM dbo.ClientData

Run the query a second time and you will see the “Data” field says “Update”:

ROWCOUNT

Use this approach when you have significantly more UPDATES than INSERTS. In this case if the vast majority of the writes to a table are updates, with the occasional insert, you can optimize by skipping the select to check if a row exists and instead always default to an update and only insert if the update modifies zero rows. To do this we use @@ROWCOUNT which tells us if the update has modified a row in the table or not. If the ROWCOUNT is 0, then we execute an insert.

Example

UPDATE dbo.ClientData SET [data] = 'Update', updatedDateUtc = GetUtcDate() WHERE ClientId = 12345
IF @@ROWCOUNT = 0
BEGIN
  INSERT INTO dbo.ClientData(clientId, [data]) VALUES (12345, 'Insert')
END

Run the query once and you will see that the “Data” field says “Insert”:

SELECT * FROM dbo.ClientData

Run the query a second time and you will see the “Data” field says “Update”:

Works just like the MERGE, but when we have an update heavy table, this option is more efficient.

I hope that helps!

SQL Server – Get Sizes of All Tables and Indexes in a Database

Even though there are great tools in SQL Server Management Studio that give you a wealth of information about all of your tables and indexes, I still find it handy to have a script I can use to quickly check the current state of all tables and indexes in a database.

Size of each Table (Including Indexes)

This query gives you a total size for each table in KB including all of the indexes on that table. The query shows the table name, row count, total space, and total space used by the table and its indexes.

SELECT
	t.[Name] AS TableName,
	p.[rows] AS [RowCount],
	SUM(a.total_pages) * 8 AS TotalSpaceKB,
	SUM(a.used_pages) * 8 AS UsedSpaceKB
FROM sys.tables t
INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
WHERE t.is_ms_shipped = 0 
	AND i.OBJECT_ID > 255
GROUP BY t.[Name], p.[Rows]
ORDER BY t.[Name]

Size of each Index

This query shows the total size in KB of each index in the database. The query shows the name of each index, which table the index is on, and the total size of the index.

SELECT
	i.[name] AS IndexName,
	t.[name] AS TableName,
	SUM(s.[used_page_count]) * 8 AS IndexSizeKB
FROM sys.dm_db_partition_stats AS s
INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id]
	AND s.[index_id] = i.[index_id]
INNER JOIN sys.tables t ON t.OBJECT_ID = i.object_id
GROUP BY i.[name], t.[name]
ORDER BY i.[name], t.[name]

I hope you find these queries useful!

SQL Server – How to write an Upsert using MERGE

Normally, when you want an to write data to a table from an application you must first do a SELECT to check if the row exists, if it does exist you execute an UPDATE and if it does not exist you execute an INSERT, which is the standard SELECT-UPDATE-INSERT pattern. The down side to this pattern is it requires two database round trips instead of one.

Now, over the years I have worked with document stores such as MongoDB and really enjoyed the freedom to be able to make a single call to the database and be able to ADD/REPLACE a document, without having to check if it exists.

Fortunately in SQL Server 2008, the MERGE function was introduced.

MERGE allows you to make only a single database round trip when you want to INSERT a row if it does not exist, or UPDATE a row if it does exist. The following is a simple example showing how to use the MERGE statement.

Overview

Here is a quick overview of the four sections of a MERGE statement

  • MERGE – specifies the table we will be inserting a row into or updating
  • USING – defines the condition we will be using to check if the row exists or not
  • WHEN MATCHED THEN – update statement to run when the row exists
  • WHEN NOT MATCHED – insert statement to run when the row does not exist

Example

Table to Update/Insert

Lets create a table to use for our test of the MERGE statement:

CREATE TABLE dbo.ClientData(
    ClientId [bigint] NOT NULL,
    Data [varchar](20) NOT NULL,
    UpdatedDateUtc [datetime] NOT NULL DEFAULT (getutcdate()),
 CONSTRAINT [PK_ClientData_ClientId] PRIMARY KEY CLUSTERED (
    ClientId ASC
)) ON [PRIMARY]
GO

MERGE Statement

Lets create a MERGE statement to INSERT or UPDATE a row in table ClientData:

MERGE dbo.ClientData AS [Target]
USING (SELECT 12345 AS clientId) AS [Source] 
ON [Target].clientId = [Source].clientId
WHEN MATCHED THEN 
  UPDATE SET [Target].data='Update', [Target].updatedDateUtc = GetUtcDate()
WHEN NOT MATCHED THEN 
  INSERT (clientId, data) VALUES ([Source].clientId, 'Insert');

Note: The INSERT statement in the MERGE sets the value of the column “Data” to “Insert” and the UPDATE statement in the MERGE sets the value of the column “Data” to “Update”.

First Execution

Run the query once and you will see that the INSERT statement has been executed and the “Data” field is set to “Insert”:

SELECT * FROM dbo.ClientData

Second Execution

Run the query a second time and you will see that the UPDATE statement has been executed and the “Data” field is set to “Update”:

SELECT * FROM dbo.ClientData

If you are curious about the performance difference between MERGE and SELECT-INSERT-UPDATE here is a performance comparison.

I hope that helps!

Java – Automate database schema updates with Flyway

I am currently working on a Java 8 project which is a REST API deployed as a WAR file to Jetty. Our deploy process is very simple, our deploy pipeline just copies the WAR file into the Jetty directory in each environment then verifies the app is up and running with the correct version and runs some integration tests.

We wanted to be able to apply database migration scripts automatically in each environment (Dev, Test, QA, Staging, Prod) as we did our deploy, so we would no longer have to worry about manually applying scripts. In the past for Java, Scala, and .NET projects I have used several different tools, but for this project we decided to use Flyway which is very flexible and simple to setup.

The documentation for Flyway is excellent, however I decided to just post what we did in our app in case it might help someone else out. Here is our “Quick Start” setup.

1. Add the Flyway dependency

Flyway can be setup using Maven, Gradle, SBT, Ant, etc. In our project we used Maven, so all we did was add the following to our pom.xml file:

<dependency>
    <groupId>org.flywaydb</groupId>
    <artifactId>flyway-core</artifactId>
    <version>3.2.1</version>
</dependency>

2. Code

Now, to get Flyway to check for any database changes to apply, whenever the application is started, put the following code somewhere in your applications startup sequence.

Flyway flyway = new Flyway();
//Create the dbo.schema_version table if it does not already exist
flyway.setBaselineOnMigrate(true);
//Where the method "getDataSource()" provides your DataSource 
//object that has the jdbc url, username, and password.
flyway.setDataSource(getDataSource()); 
flyway.migrate();

3. SQL Script Location

All migrations scripts by default must go in the following folder in your app:

/resources/db/migration

4. SQL Script Naming convention

Scripts are run in version number order based on their names, the default naming convention is:

Version__description.sql
For example: “V1_0_1__create_tables.sql”

All scripts must start with the letter “V”, followed by major/minor version numbers, with two underscores “__” separating the version from the description.

5. Schema Version Table

Flywaydb will automatically create a table in each database called “dbo.schema_version” which stores a log of all migration scripts that have been applied.

The table looks like:

version_rank installed_rank version description type script checksum installed_by installed_on execution_time success
1 1 1 Flyway Baseline BASELINE Flyway Baseline NULL JohnSmith 2015-07-30 15:55:49.337 0 1
1 2 1.0.1 create tables SQL V1_0_1__create_tables.sql -440358290 JohnSmith 2015-07-30 15:55:49.337 109 1

6. Handling Failure

If a script fails, the app will fail to start and the failure information is written to our log files as well as a monitor is tripped. Flywaydb applies each script as a transaction so all changes in the script will be rolled back if any part of the script fails. This is very handy because if you commit a script with some invalid SQL syntax, all you have to do is update the script with the corrected syntax, commit it again, and let the build pipeline apply the changes from the fixed script. No messy cleanup or reset to worry about.

So that is it, you should have all you need to get your database schema changes easily synced up with the deploy of your app!

SQL Server – Simple Recursive Query Example

Every once in a while I need to write a recursive query in SQL Server and I always forget the syntax so I have to track down a simple example to help me remember. However, if you are trying to write a recursive query for the first time, I find some of the examples online to be a little bit too complicated. So I wanted to post an example, but also give you the script to create the table and populate it with data so you can see how it works and try it yourself. In this post I use the common example of a table with countries, states, and cities and where we want to get a list of all cities in a single country. Enjoy!

Create a table called “Area”:

CREATE TABLE dbo.Area(
   AreaID int NOT NULL,
   AreaName varchar(100) NOT NULL,
   ParentAreaID int NULL,
   AreaType varchar(20) NOT NULL
CONSTRAINT PK_Area PRIMARY KEY CLUSTERED 
( AreaID ASC
) ON [PRIMARY])
GO

Add some “Area” data:

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(1, 'Canada', null, 'Country')

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(2, 'United States', null, 'Country')

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(3, 'Saskatchewan', 1, 'State')

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(4, 'Saskatoon', 3, 'City')

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(5, 'Florida', 2, 'State')

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(6, 'Miami', 5, 'City')

If I do a select by AreaType “City”:

select * from dbo.Area
where AreaType = 'City'

I get both Saskatoon and Miami:

AreaID	AreaName	ParentAreaID	AreaType
4       Saskatoon       3            City
6       Miami           5            City

However, what if I wanted to return all cities in Canada?

You can accomplish this by doing a recursive select which uses a common table expression (CTE).

WITH AreasCTE AS 
( 
--anchor select, start with the country of Canada, which will be the root element for our search
SELECT AreaID, AreaName, ParentAreaID, AreaType
FROM dbo.Area 
WHERE AreaName = 'Canada'
UNION ALL 
--recursive select, recursive until you reach a leaf (an Area which is not a parent of any other area)
SELECT a.AreaID, a.AreaName, a.ParentAreaID, a.AreaType 
FROM dbo.Area a 
INNER JOIN AreasCTE s ON a.ParentAreaID = s.AreaID 
) 
--Now, you will have all Areas in Canada, so now let's filter by the AreaType "City"
SELECT * FROM AreasCTE  
where AreaType = 'City' 

Now we get back the following results for cities in Canada:

AreaID	AreaName	ParentAreaID	AreaType
4       Saskatoon       3               City

That’s it! Now we have written a simple recursive query!

SQL Server – Alter database in Single User mode to Multi User mode

We have some test environment databases that get rebuilt nightly. The job that does the rebuild always switches the database into single user mode when it is being rebuilt. When the rebuild is finished it will switch it back to multi user mode. However, if there was a problem with a script and the job failed, it will leave the database in single user mode.

First, open a SQL Server Management Studio query window connected to database “master”.

The command to change the database back to multi user mode is:

ALTER DATABASE {InsertDatabaseNameHere} SET MULTI_USER WITH ROLLBACK IMMEDIATE
GO

However, if there is an existing process blocking this alter, you may get the following error message:

“Transaction (Process ID 864) was deadlocked on lock resources with another process and has been chosen as the deadlock victim”

Since both the existing running process and the current request have the same deadlock priority, preference is given to the longer running process which is why your alter database command is chosen as the deadlock victim. So to deal with this when you run the command again, set the deadlock priority to HIGH.

SET DEADLOCK_PRIORITY HIGH
GO
ALTER DATABASE DatabaseNameGoesHere SET MULTI_USER WITH ROLLBACK IMMEDIATE
GO

However, if that does not work, then it means the other command also has a deadlock priority of HIGH and you will need to kill the existing process. To find the “spid” of the existing process, you can use the following query:

SELECT sd.[name], sp.spid, sp.login_time, sp.loginame
FROM sysprocesses sp
INNER JOIN sysdatabases sd on sp.dbid = sd.dbid
WHERE sd.[name] = 'DatabaseNameGoesHere'

As an alternative, you can also use the command “sp_who” to get the “spid” of the open connection:

exec sp_who

So, here is our revised command:

KILL SpidToKillGoesHere
GO
SET DEADLOCK_PRIORITY HIGH
GO
ALTER DATABASE DatabaseNameGoesHere SET MULTI_USER WITH ROLLBACK IMMEDIATE
GO

That should do it!

SQL Server – Check Index Fragmentation on ALL Indexes in a Database

Often, when I am using a development or test environment and I run into a situation where a query is behaving slower than normal, first thing I want to rule out is “Do I have any fragmented indexes?”. Here is a simple query that will list every index on every table in your database, ordered by percentage of index fragmentation. This query will work on SQL2K5 or newer.

SELECT dbschemas.[name] as 'Schema', 
	dbtables.[name] as 'Table', 
	dbindexes.[name] as 'Index',
	indexstats.alloc_unit_type_desc,
	indexstats.avg_fragmentation_in_percent,
	indexstats.page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS indexstats
INNER JOIN sys.tables dbtables on dbtables.[object_id] = indexstats.[object_id]
INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id]
INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]
	AND indexstats.index_id = dbindexes.index_id
WHERE indexstats.database_id = DB_ID()
ORDER BY indexstats.avg_fragmentation_in_percent desc

The output will look like:

Schema   Table            Index                     alloc_unit_type_desc  avg_fragmentation_in_percent  page_count
------------------------------------------------------------------------------------------------------------------
History	 TrackingHistory  PK_TrackingHistory_trackingId IN_ROW_DATA	          97.5308641975309              81
History	 ProductHistory   PK_ProductHistory_productId   IN_ROW_DATA	          96.4508641975309              87
Account	 Customer         UC_Customer_email             IN_ROW_DATA	          95.5103734439834              241

That should do the trick!