SQL Server – Applying Schema Changes without Downtime

Over the years I have worked on many projects using SQL Server with applications in C#, Java, Python, and Scala. Even though the programming languages and libraries we use for managing migration scripts might change, the basic principles of how we write migration scripts for clustered applications using a SQL Server instance have remained the same.

There are very few occasions (if any) where a schema change to a database should require downtime, so in this article I have outlined step by step instructions on how to apply some of the more common types of schema changes without any need to take the database offline. Since I have been following these steps for years but never wrote them down anywhere, I finally decided to document them in this article. I hope this helps!

Why Backwards Compatible Schema Changes Matter

Anytime you are making a change to a SQL database such as adding a column, renaming a table, etc, you must always ensure your change is backwards compatible with the currently deployed version of the application for two reasons:

  1. Clustered Application. Our application runs in a clustered environment. So when the application is deployed, it will be deployed one node at a time which means there will be a short period of time where your new version of the application with the new version of the database will be running side-by-side with an older version of the application that is also running on the new version of the database
  2. Rolling Back. If the application deploy needs to be rolled back, we want to be in a state where the database changes added are backwards compatible with the previous version, to make rolling back the application not require undoing database changes. Thinking this way is important in a clustered environment, since if you started the rollback of your application one node at a time and rolled back your database changes, then your nodes running the newer version of the application would break until they had all been rolled back.

Common Migration Patterns

Note: In the steps for each of these patterns, I am assuming each script is being run automatically by some migration script management library (For example: Flyway) when you are releasing your application.

Rename a Table

Scenario: Rename an existing table while remaining backwards compatible

Pattern: Use a database synonym to maintain backwards compatibility with the old table name

Steps:

  1. Add script that renames the table and also creates a synonym
  2. Update code to use the new table name
  3. Release app
  4. Create an issue for a future sprint to drop the synonym 

Here is an example migration script that renames the table and creates a synonym:

IF EXISTS (
    SELECT * FROM INFORMATION_SCHEMA.TABLES 
    WHERE TABLE_SCHEMA = N'MySchema' AND TABLE_NAME = N'MyOldTable')
BEGIN
    EXEC sp_rename 'MySchema.MyOldTable', 'MyNewTable'; 
    CREATE SYNONYM MySchema.MyOldTable
    FOR MySchema.MyNewTable;
END
GO

In a future sprint create a script to drop the old synonym:

IF EXISTS(SELECT * FROM sys.synonyms WHERE name = 'MyOldTable')
BEGIN
    DROP SYNONYM [MySchema].[MyOldTable]
END
GO

Move a Table to a Different Schema

Scenario: Move an existing table to a different schema while remaining backwards compatible

Pattern: Use a database synonym to maintain backwards compatibility with the old schema name for the table

Steps:

  1. Add script that both moves the table to the new schema and also adds a synonym 
  2. Update code to use new schema name
  3. Release App
  4. Create an issue for a future sprint to drop the synonym

Here is an example migration script that changes the table schema and creates a synonym:

IF EXISTS (
    SELECT * FROM INFORMATION_SCHEMA.TABLES 
    WHERE TABLE_SCHEMA = N'MyOldSchema' AND TABLE_NAME = N'MyTable')
BEGIN
    ALTER SCHEMA MyNewSchema
    TRANSFER MyOldSchema.MyTable; 
    
    CREATE SYNONYM MyOldSchema.MyTable
    FOR MyNewSchema.MyTable;
END
GO

In a future sprint create a script to drop the old synonym:

IF EXISTS(SELECT * FROM sys.synonyms WHERE name = 'MyTable')
BEGIN
    DROP SYNONYM [MyOldSchema].[MyTable]
END
GO

Alter Columns Used in an Index

Scenario: Move an index from being on columns A and B to being on columns A and C while remaining backwards compatible

Pattern: Rather than altering the existing index, create a new index, then drop the old index in a future release

Steps:

  1. Add script to create a new index with the columns you want
  2. Update code to have queries use the new index (If needed)
  3. Release app
  4. Create an issue for a future sprint to drop the old index

Example script to add the new index:

IF NOT EXISTS(
    SELECT * FROM sys.indexes 
    WHERE name='IX_MySchema_MyTable_ColumnA_ColumnC')
BEGIN
    CREATE UNIQUE NONCLUSTERED INDEX IX_MySchema_MyTable_ColumnA_ColumnC
    ON [MySchema].[MyTable]([ColumnA],[ColumnC])
END
GO

In a future sprint, add a script to drop the old index:

IF EXISTS(
    SELECT * FROM sys.indexes 
    WHERE name='IX_MySchema_MyTable_ColumnA_ColumnB')
BEGIN
    DROP INDEX IX_MySchema_MyTable_ColumnA_ColumnB 
    ON [MySchema].[MyTable]
END
GO

Add a New Non-Nullable Column

Scenario: Add a new non-nullable column to an existing table where the table already has data while remaining backwards compatible

Pattern: Add the new column as nullable first, populate the data, then make it non-nullable later

Steps:

  1. Add script that adds the column as a new nullable column
  2. Update code to start writing to, but not reading from, this new column
  3. Release app
  4. Add script to backfill any rows where the column value is still null
  5. Add script to make the column non-nullable
  6. Update code to start reading from the new non-nullable column
  7. Release app

Example script to add the nullable column:

IF NOT EXISTS(
    SELECT * FROM sys.columns 
    WHERE Name = N'MyColumn' AND Object_ID = Object_ID(N'MySchema.MyTable'))
BEGIN
    ALTER TABLE MySchema.MyTable
    ADD [MyColumn] [uniqueIdentifier] NULL
END
GO

In a future sprint, once all rows in the table have this column populated:

ALTER TABLE [MySchema].[MyTable]
ALTER COLUMN MyColumn [uniqueIdentifier] NOT NULL
GO

Drop a Non-Nullable Column

Scenario: Drop a non-nullable column from an existing table while remaining backwards compatible

Pattern: Alter the column to make it nullable first, remove usages, then drop the column

Steps:

  1. Add script to make the existing column nullable
  2. Update code to stop reading from the column
  3. Release app
  4. Update code to stop writing to the column
  5. Release app
  6. Create an issue for a future sprint to drop the column

Example script to make the column nullable:

IF EXISTS(
    SELECT * FROM sys.columns 
    WHERE Name = N'MyColumn' AND Object_ID = Object_ID(N'MySchema.MyTable'))
BEGIN
    ALTER TABLE [MySchema].[MyTable]
    ALTER COLUMN MyColumn [uniqueIdentifier] NULL
END
GO

In a future sprint, once we will nolonger rollback to a version that references this column, add a script to drop it:

IF EXISTS(
    SELECT * FROM sys.columns 
    WHERE Name = N'MyColumn' AND Object_ID = Object_ID(N'MySchema.MyTable'))
BEGIN
    ALTER TABLE MySchema.MyTable
    DROP COLUMN [MyColumn]
END
GO

Rename a Non-Nullable Column

Scenario: Rename an existing non-nullable column while remaining backwards compatible

Pattern: Rather than altering the name of the existing column, add a new column with the name you want, populate the data, then drop the old column after all usages have been moved to the new column

Steps:

  1. Add script that adds the column as a new nullable column (it must be nullable temporarily)
  2. Update code to start writing to the new column, while still writing to and reading from the old column
  3. Release app
  4. Add script to copy all values from the old column to the new column
  5. Update code to start reading from the new column
  6. Release app
  7. Add script to make old column nullable
  8. Update code to stop writing to the old column
  9. Add script to make the new column non-nullable (now we can enforce the new column is non-nullable)
  10. Release app
  11. Create an issue for a future sprint to drop the old column

Example script to add the nullable column (Note, it must be nullable until we are writing to it and have migrated the old data):

IF NOT EXISTS(
    SELECT * FROM sys.columns 
    WHERE Name = N'NewColumn' AND Object_ID = Object_ID(N'MySchema.MyTable'))
BEGIN
    ALTER TABLE MySchema.MyTableADD [NewColumn] [varchar(100)] NULL
END
GO

Example script to copy the values from the old column to the new column:

NOTE: If the table has a lot of traffic, it may be a better approach to migrate the data using a cursor (Cursor Example):

UPDATE MySchema.MyTable
SET NewColumn = OldColumn
WHERE OldColumn IS NOT NULL AND NewColumn IS NULL

Example script to update the new column to be non-nullable (now that we have populated the data for the new column in all rows):

IF  EXISTS(
    SELECT * FROM sys.columns 
    WHERE Name = N'NewColumn' AND Object_ID = Object_ID(N'MySchema.MyTable'))
BEGIN
    ALTER TABLE MySchema.MyTable
    ALTER COLUMN [NewColumn] [varchar(100)] NOT NULL
END
GO

In a future sprint, once we will no longer rollback to a version that references the old column, add a script to drop it:

IF EXISTS(
    SELECT * FROM sys.columns 
    WHERE Name = N'OldColumn' AND Object_ID = Object_ID(N'MySchema.MyTable'))
BEGIN
    ALTER TABLE MySchema.MyTable
    DROP COLUMN [OldColumn]
END
GO

Summary

There are many types of schema change scenarios I did not touch on in this article. I just wanted to focus on the most common ones and show that for many types of schema changes there is always a way to do it that does not involve taking the database offline.

I hope that helps!

Cassandra – Fix Schema Disagreement

Recently we had an issue when adding a new table to a Cassandra cluster (version 3.11.2). We added a new table create statement to our Java application, deployed our application to our Cassandra cluster in each of our test environments, the table was created, we could read and write data, everything was fine.

However, when we deployed to our production Cassandra cluster, the new table was created, but we were unable to query the table from any node in the cluster. When our Java application tried to do a select from the table Cassandra would return the following error:

Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)

Next we tried connecting to each node in the cluster using CQLSH, but we still had the same issue. On every node Cassandra knew about the table and we could see the schema definition for the table, but when we tried to query it we would get the following error:

ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' response]
message="Operation timed out - received only 0 responses." info={'received_response': 0, 'required_response': 1, 'consistency': 'ONE'}

We decided to try describe cluster to see if we could get any useful info:

nodetool describecluster

There was our problem! We had a schema disagreement! Three nodes of our six node cluster were on a different schema:

Cluster Information:
        Name: OurCassandraCluster
        Snitch: org.apache.cassandra.locator.SimpleSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
            819a3ce1-a42c-3ba9-bd39-7c015749f11a: [10.111.22.103, 10.111.22.105, 10.111.22.104]

            134b246c-8d42-31a7-afd1-71e8a2d3d8a3: [10.111.22.102, 10.111.22.101, 10.111.22.106]

We checked DataStax, which had the article Handling Schema Disagreements. However, their official documentation was sparse and was assuming a node was unreachable.

In our case all the nodes were reachable, the cluster was functioning fine, all previously added tables were receiving traffic, it was only the new table we just added that was having a problem.

We found a StackOverlow post suggesting a fix for the schema disagreement issue was to cycle the nodes, one at a time. We tried that and it did work. The following are the steps that worked for us.

Steps to Fix Schema Disagreement

If there are more nodes in one schema than in the other, you can start by trying to restart a Cassandra node in the smaller list and see if it joins the other schema list.

In our case we had exactly three nodes on each schema. In this case it is more likely the nodes in the first schema are the ones that Cassandra will pick during a schema negotiation, so try the following instructions on one of the nodes in the second schema list.

Connect to a node

Connect to one of the nodes in the second schema list. For this example lets pick node “10.111.22.102”;

Restart Cassandra

First, drain the node. This will flush all in memory sstables to disk and stop receiving traffic from other nodes.

nodetool drain

Now, check the status to see if the drain operation has finished.

systemctl status cassandra

You should see in the output that the drain operation was completed successfully.
Drained_Node_Confirmation

Stop Cassandra

systemctl stop cassandra

Start Cassandra

systemctl start cassandra

Verify Cassandra is up

Lets check the journal to ensure Cassandra has restarted successfully

journalctl -f -u cassandra

When you see the following message, it means Cassandra has finished restarting and is ready for clients to connect.

Cassandra_Startup_Completed

Verify Schema Issue Fixed For Node

Now that Cassandra is back up, run the describe cluster command again to see if the node has switched to the other schema:

nodetool describecluster

If all has gone well, you should see that node “10.111.22.102” has moved to the other schema list (Note: The node list is not sorted by IP):

Cluster Information:
        Name: OurCassandraCluster
        Snitch: org.apache.cassandra.locator.SimpleSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
             819a3ce1-a42c-3ba9-bd39-7c015749f11a: [10.111.22.103, 10.111.22.102, 10.111.22.105, 10.111.22.104]

             134b246c-8d42-31a7-afd1-71e8a2d3d8a3: [10.111.22.101, 10.111.22.106]

If Node Schema Did Not Change

If this did not work, it means the other schema is the one Cassandra has decided is the authority, so repeat these steps for the list of nodes in the first schema list.

Fixed Cluster Schema

Once you have completed the above steps on each node, all nodes should now be on a single schema:

Cluster Information:
        Name: OurCassandraCluster
        Snitch: org.apache.cassandra.locator.SimpleSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
             819a3ce1-a42c-3ba9-bd39-7c015749f11a: [10.111.22.103, 10.111.22.102, 10.111.22.101, 
                                                    10.111.22.106, 10.111.22.105, 10.111.22.104]

I hope that helps!

Java – Automate database schema updates with Flyway

I am currently working on a Java 8 project which is a REST API deployed as a WAR file to Jetty. Our deploy process is very simple, our deploy pipeline just copies the WAR file into the Jetty directory in each environment then verifies the app is up and running with the correct version and runs some integration tests.

We wanted to be able to apply database migration scripts automatically in each environment (Dev, Test, QA, Staging, Prod) as we did our deploy, so we would no longer have to worry about manually applying scripts. In the past for Java, Scala, and .NET projects I have used several different tools, but for this project we decided to use Flyway which is very flexible and simple to setup.

The documentation for Flyway is excellent, however I decided to just post what we did in our app in case it might help someone else out. Here is our “Quick Start” setup.

1. Add the Flyway dependency

Flyway can be setup using Maven, Gradle, SBT, Ant, etc. In our project we used Maven, so all we did was add the following to our pom.xml file:

<dependency>
<groupId>org.flywaydb</groupId>
<artifactId>flyway-core</artifactId>
<version>3.2.1</version>
</dependency>

2. Code

Now, to get Flyway to check for any database changes to apply, whenever the application is started, put the following code somewhere in your applications startup sequence.

Flyway flyway = new Flyway();
//Create the dbo.schema_version table if it does not already exist
flyway.setBaselineOnMigrate(true);
//Where the method "getDataSource()" provides your DataSource
//object that has the jdbc url, username, and password.
flyway.setDataSource(getDataSource());
flyway.migrate();

3. SQL Script Location

All migrations scripts by default must go in the following folder in your app:

/resources/db/migration

4. SQL Script Naming convention

Scripts are run in version number order based on their names, the default naming convention is:

Version__description.sql
For example: “V1_0_1__create_tables.sql”

All scripts must start with the letter “V”, followed by major/minor version numbers, with two underscores “__” separating the version from the description.

5. Schema Version Table

Flywaydb will automatically create a table in each database called “dbo.schema_version” which stores a log of all migration scripts that have been applied.

The table looks like:

version_rank installed_rank version description type script checksum installed_by installed_on execution_time success
1 1 1 Flyway Baseline BASELINE Flyway Baseline NULL JohnSmith 2015-07-30 15:55:49.337 0 1
1 2 1.0.1 create tables SQL V1_0_1__create_tables.sql -440358290 JohnSmith 2015-07-30 15:55:49.337 109 1

6. Handling Failure

If a script fails, the app will fail to start and the failure information is written to our log files as well as a monitor is tripped. Flywaydb applies each script as a transaction so all changes in the script will be rolled back if any part of the script fails. This is very handy because if you commit a script with some invalid SQL syntax, all you have to do is update the script with the corrected syntax, commit it again, and let the build pipeline apply the changes from the fixed script. No messy cleanup or reset to worry about.

So that is it, you should have all you need to get your database schema changes easily synced up with the deploy of your app!

Validating Xml with a Schema in an NUnit Test (.NET 3.5/C#)

Recently we decided to make some updates to our web app. The front end website is in ASP.NET/C#, while the back end is all in Java and using REST APIs as the interface. One pair of programmers was assigned the task of updating the web app and the other pair was assigned the task of writing the new REST service used by the web app. To ensure that both sides would be able to talk to each other when completed, we wrote an XML Schema to act as our contract and placed it in an SVN external available to both projects.

In our functional test for the REST client side code (.NET/C#) we used a stub from which we could grab the XML being sent. To test that this XML conformed to the schema (contract) we agreed to, we created the following helper class:

Xml Validator Helper Class

using System;
using System.Xml;
using System.Xml.Schema;
using System.IO;

namespace Test.Helper
{
	public class XmlValidatorTestHelper
	{
		private bool isValidXml = true;
		private string validationError = "";

		public String ValidationError
		{
			get { return "Validation Error: " + validationError; }
			set { validationError = value; }
		}

		public bool IsValidXml
		{
			get { return isValidXml; }
		}

		/// 
		/// Validate an xml document against an xml schema.
		/// 
		public void ValidXmlDoc(XmlDocument xmlDocument, XmlSchema xmlSchema)
		{
			validateParameters(xmlDocument, xmlSchema);
			XmlReader xmlReader = createXmlReader(xmlDocument, xmlSchema);
					
			try
			{
				// validate        
				using (xmlReader)
				{
					while (xmlReader.Read())
					{}
				}
				isValidXml = true;
			}
			catch (Exception ex)
			{
				ValidationError = ex.Message;
				isValidXml = false;
			}
		}

		private static void validateParameters(XmlDocument xmlDocument, XmlSchema xmlSchema)
		{
			if (xmlDocument == null)
			{
				new ArgumentNullException("ValidXmlDoc() - Argument NULL: XmlDocument");
			}
			if (xmlSchema == null)
			{
				new ArgumentNullException("ValidXmlDoc() - Argument NULL: XmlSchema");
			}
		}

		private static XmlReader createXmlReader(XmlDocument xmlDocument, XmlSchema xmlSchema)
		{
			StringReader xmlStringReader = convertXmlDocumentToStringReader(xmlDocument);
			XmlReaderSettings xmlReaderSettings = new XmlReaderSettings 
                        { 
                               ValidationType = ValidationType.Schema 
                        };
			xmlReaderSettings.Schemas.Add(xmlSchema);
			return XmlReader.Create(xmlStringReader, xmlReaderSettings);
		}

		private static StringReader convertXmlDocumentToStringReader(XmlDocument xmlDocument)
		{
			StringWriter sw = new StringWriter();
			xmlDocument.WriteTo(new XmlTextWriter(sw));
			return new StringReader(sw.ToString());
		}
	}
}

Surfacing Validation Errors in your Test

Finding out that the XML our REST client produces matches the contract is great, however when it fails it would save a lot of time if the test just showed WHY validation failed. No problem, in our validation helper class we added a property to allow us to retrieve the validation error message in the event the test failed and output that as the failure message for the test:

XmlValidatorTestHelper xmlSchemaValidator = new XmlValidatorTestHelper();
XmlSchema myXmlSchema = XmlSchema.Read(...);
XmlDocument myXmlDocument = new XmlDocument();
myXmlDocument .Load(...);
xmlSchemaValidator.ValidXmlDoc(myXmlDocument, myXmlSchema);

Assert.IsTrue(xmlSchemaValidator.IsValidXml, "XML does not match Schema: " + xmlSchemaValidator.ValidationError);