SQL Server – How to Partition a Table by Date

There are many excellent posts available that will explain in detail how partitions work in SQL Server. However, for this post I just wanted to help anyone new to using SQL Server Partitioned tables to get a working end to end example up and running quickly.

Note: At the time of the post I was using SQL Server 2016 Enterprise Edition

You can download the scripts from GitHub here;

Overview

Why Partition

There are several reasons why we moved some of our tables to being partitioned:

  • Performance – Querying one or two small partitions is often significantly cheaper than querying one large table.
  • Indexing – If data is only inserted into the newest partition and never old ones, indexes on old partitions will never fragment.
  • Management – If you are only keeping data for a set amount of time, truncating of old partitions is much more efficient than running delete statements.

Selecting a Partition Size

You want to make sure you select a partition size that makes sense, both for the volume of data you will be storing but also for how the data will be queried.

For example, in our use case we partitioned our data by day because the vast majority of queries were only for the last day or two. So the majority of queries only needed to retrieve data from at most two partitions.

However, if most of your queries are looking at longer date ranges, it means data from more partitions would need to be joined making it more appropriate to choose a larger partition size.

Creating a Partitioned Table

Create the Tables

For this example we are going to have two tables:

  • Thermostat – Stores the identifier for each thermostat we are tracking.
  • TemperatureHistory – Stores a record for every time we receive a temperature reading from a thermostat. This table will be partitioned by day.

Create Thermostat table:

CREATE TABLE [dbo].[Thermostat] (
	[thermostatId]     [bigint] IDENTITY(1,1)	NOT NULL,
	[createdDateUtc]   [datetime]				NOT NULL,
	[name]             [nvarchar](100)			NOT NULL,
 CONSTRAINT [PK_Thermostat] PRIMARY KEY CLUSTERED (
	[thermostatId] ASC
	),
 CONSTRAINT [UC_Thermostat_Name] UNIQUE NONCLUSTERED (
	[name] ASC
	) 
) 
GO

Create the TemperatureHistory table:

CREATE TABLE [dbo].[TemperatureHistory] (
    [thermostatId]   [bigint]   NOT NULL,
    [timestamp]      [datetime] NOT NULL,
    [temperature]    [float]    NOT NULL
)
GO

Note: Notice how we have not created a primary key on the TemperatureHistory table. That will come later since it is in the creation of the primary key where we will add partitioning.

Add foreign key for ThermostatId from TemperatureHistory to Thermostat table:

ALTER TABLE [dbo].[TemperatureHistory]  
	WITH CHECK ADD  CONSTRAINT [FK_TemperatureHistory_Thermostat_thermostatId] 
	FOREIGN KEY([thermostatId])
REFERENCES [dbo].[Thermostat] ([thermostatId])
GO

ALTER TABLE [dbo].[TemperatureHistory] 
CHECK CONSTRAINT [FK_TemperatureHistory_Thermostat_thermostatId]
GO

Create a Partition Function

In the create statement below you will notice the option RANGE RIGHT. There are actually two options for RANGE and here are what they mean:

  • LEFT – the date on the partition is the end date. This means all data inserted into the partition will be before the partition date but after the end date of the previous partition.
  • RIGHT – the date on the partition is the start date. This means all data inserted into the partition will be after the partition date but before the start date of the next partition.

For a table partitioned by date I personally prefer to use RIGHT since if we run out of partitions, all data from that point on will just be inserted into the last partition until we add more partitions. But it all depends on what works best for your use case.

CREATE PARTITION FUNCTION [DailyPartitionFunction](datetime) AS RANGE RIGHT 
FOR VALUES (N'2020-06-01T00:00:00.000', N'2020-06-02T00:00:00.000', N'2020-06-03T00:00:00.000', 
			N'2020-06-04T00:00:00.000', N'2020-06-05T00:00:00.000', N'2020-06-06T00:00:00.000', 
			N'2020-06-07T00:00:00.000', N'2020-06-08T00:00:00.000', N'2020-06-09T00:00:00.000', 
			N'2020-06-10T00:00:00.000')
GO

Create a Partition Scheme

CREATE PARTITION SCHEME DailyPartitionScheme
AS PARTITION DailyPartitionFunction ALL TO ([PRIMARY])
GO

Create a Partitioned Primary Key

The final step is to create a primary key on table TemperatureHistory using our new partition scheme:

ALTER TABLE [dbo].[TemperatureHistory]
    ADD CONSTRAINT [PK_TemperatureHistory_Partition_thermostatId_timestamp]
PRIMARY KEY CLUSTERED (
    [thermostatId] ASC,
    [timestamp] ASC
    )
    WITH (
		ONLINE = OFF -- This can be ONLINE = ON if using SQL Server Enterprise Edition 
    )
    ON DailyPartitionScheme([timestamp]);
GO

We now have a table, TemperatureHistory, which is partitioned by day.

Insert some Test Data

Add a thermostat:

INSERT INTO [dbo].[Thermostat] ([createdDateUtc], [name])
VALUES('2020-06-01', 'ABC-455-Lobby')
GO

Add some temperature history:

INSERT INTO [dbo].[TemperatureHistory]([thermostatId],[timestamp],[temperature])
VALUES(1,'2020-06-01 10:20:00.000',23.7)
GO
INSERT INTO [dbo].[TemperatureHistory]([thermostatId],[timestamp],[temperature])
VALUES(1,'2020-06-03 08:28:00.000',22.4)
GO
INSERT INTO [dbo].[TemperatureHistory]([thermostatId],[timestamp],[temperature])
VALUES(1,'2020-06-07 01:10:00.000',22.8)
GO
INSERT INTO [dbo].[TemperatureHistory]([thermostatId],[timestamp],[temperature])
VALUES(1,'2020-06-09 07:34:00.000',24.7)
GO

So now if you do a select from table Thermostat you should see:

SELECT * FROM [dbo].[Thermostat]
thermostatId    createdDateUtc            name
-------------------------------------------------------
1               2020-06-01 00:00:00.000   ABC-455-Lobby

Then if you do a select from our partitioned table TemperatureHistory you should see:

SELECT * FROM [dbo].[TemperatureHistory]
WHERE thermostatId = 1 
AND [timestamp] BETWEEN '2020-06-01' AND '2020-06-10'
thermostatId    timestamp                  temperature
-------------------------------------------------------
1               2020-06-01 10:20:00.000    23.7
1               2020-06-03 08:28:00.000    22.4
1               2020-06-07 01:10:00.000    22.8
1               2020-06-09 07:34:00.000    24.7

Querying a Partitioned Table

One of the big benefits to a partitioned table is you only need to query the partitions that contain the data you are looking for. So when it comes to a table partitioned by date it is import to ALWAYS specify a range with a start date and end date for every query. If you fail to do this you may end up with queries that read from more partitions than necessary which will impact performance.

For example the following query should only read from a single partition:

SELECT * FROM [dbo].[TemperatureHistory]
WHERE thermostatId = 1 
AND [timestamp] BETWEEN '2020-06-03 00:00:00.000' AND '2020-06-03 23:59:59.000'

In SSMS (SQL Server Management Studio) you can use the Include Actual Execution Plan option to give you more insight into what your query is doing.

With execution plan enabled, execute the query and you will see it returns one row as expected:

Now, select the Execution Plan tab, then right click on Clustered Index Seek and select Properties.

The properties window will give you some useful information about the performance of your query on the partitioned table, but requires some explanation.

  • Actual Partition Count – Is the count of the total number of partitions that were read.
  • Actual Partitions Accessed – Is the number of the partition from which the data was read. We have 10 partitions, the row returned came from partition 4.

Now we have confirmed our query is only reading from a single partition!

That is all! I hope that helps!

References