My Adventures in Coding

April 13, 2016

SQL Server – Get Sizes of All Tables and Indexes in a Database

Filed under: SQL,SQL Server — Brian @ 9:51 pm
Tags: , , , ,

Even though there are great tools in SQL Server Management Studio that give you a wealth of information about all of your tables and indexes, I still find it handy to have a script I can use to quickly check the current state of all tables and indexes in a database.

Size of each Table (Including Indexes)

This query gives you a total size for each table in KB including all of the indexes on that table. The query shows the table name, row count, total space, and total space used by the table and its indexes.

SELECT
    t.[Name] AS TableName,
    p.[rows] AS [RowCount],
    SUM(a.total_pages) * 8 AS TotalSpaceKB,
    SUM(a.used_pages) * 8 AS UsedSpaceKB
FROM sys.tables t
INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
WHERE t.is_ms_shipped = 0 AND i.OBJECT_ID > 255
GROUP BY t.[Name], p.[Rows]
ORDER BY t.[Name]

Size of each Index

This query shows the total size in KB of each index in the database. The query shows the name of each index, which table the index is on, and the total size of the index.

SELECT 
    i.[name] AS IndexName,
    t.[name] AS TableName,
    SUM(s.[used_page_count]) * 8 AS IndexSizeKB
FROM sys.dm_db_partition_stats AS s
INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id] 
	AND s.[index_id] = i.[index_id]
INNER JOIN sys.tables t ON t.OBJECT_ID = i.object_id
GROUP BY i.[name], t.[name]
ORDER BY i.[name], t.[name]

I hope you find these queries useful!

January 5, 2016

SQL Server – How to write an Upsert using MERGE

Filed under: SQL,SQL Server — Brian @ 8:35 pm
Tags: , , , ,

Normally, when you want an application to INSERT a row to a table if it does not exist or UPDATE it if it does exist, your application must first do a SELECT to check if the row exists, which is the standard SELECT-UPDATE-INSERT pattern. The down side to this pattern is it requires two database round trips instead of one.

Now, over the years I have worked with document stores such as MongoDB and really enjoyed the freedom to be able to make a single call to the database and be able to ADD/REPLACE a document, without having to check if it exists.

Fortunately in SQL Server 2008, the MERGE function was introduced.

MERGE allows you to make only a single database round trip when you want to INSERT a row if it does not exist, or UPDATE it if it does. The following is a simple example showing how to use the MERGE statement.

Quick Start

So if you just want a quick example to get you started then here you go. In this example the table “ClientData” is the one being updated.

  • MERGE – specifies the table we will be inserting a row into or updating
  • USING – defines the condition we will be using to check if the row exists or not
  • WHEN MATCHED THEN – SQL statement to run when the row exists
  • WHEN NOT MATCHED – SQL statement to run when the row does not exist
MERGE dbo.ClientData AS [Target] 
USING (SELECT 12345 AS clientId, 'Some' AS data) AS [Source] ON [Target].clientId = [Source].clientId 
WHEN MATCHED THEN UPDATE SET [Target].data = [Source].data, [Target].updatedDateUtc = GetUtcDate() 
WHEN NOT MATCHED THEN INSERT (clientId, data) VALUES ([Source].clientId, [Source].data);

How it Works

First lets create a table to use for our test of the Merge statement:

CREATE TABLE dbo.ClientData(
	ClientId [bigint] NOT NULL,
	Data [varchar](20) NOT NULL,
	UpdatedDateUtc [datetime] NOT NULL DEFAULT (getutcdate()),
 CONSTRAINT [PK_ClientData_ClientId] PRIMARY KEY CLUSTERED (
	ClientId ASC
)) ON [PRIMARY]
GO

You can verify the table has been created and see that it is empty:

SELECT * FROM dbo.ClientData

Now, run the following Merge statement for the first time, where no matching row in the table:

MERGE dbo.ClientData AS [Target]
USING (SELECT 12345 AS clientId) AS [Source] 
ON [Target].clientId = [Source].clientId
WHEN MATCHED THEN  UPDATE SET [Target].data = 'Update', [Target].updatedDateUtc = GetUtcDate()
WHEN NOT MATCHED THEN  INSERT (clientId, data) VALUES ([Source].clientId, 'Insert');

As you can see, the INSERT statement was executed:

SELECT * FROM dbo.ClientData

Now let’s run the exact same merge statement a second time and see what happens:

MERGE dbo.ClientData AS [Target]
USING (SELECT 12345 AS clientId) AS [Source] 
ON [Target].clientId = [Source].clientId
WHEN MATCHED THEN  UPDATE SET [Target].data = 'Update', [Target].updatedDateUtc = GetUtcDate()
WHEN NOT MATCHED THEN  INSERT (clientId, data) VALUES ([Source].clientId, 'Insert');

Now, you can see the the UPDATE statement was executed since the “data” field has been updated to the text “Update”:

select * from dbo.ClientData

If you are curious about the performance difference between MERGE and SELECT-INSERT-UPDATE here is a performance comparison.

January 11, 2015

Java to Sql Server – Cannot create PoolableConnectionFactory

Filed under: Java,SQL Server,Uncategorized — Brian @ 11:14 pm
Tags: , ,

If you are connecting a Java application to a SQL Server database, which is up and running, but your connection fails with the following error:

Could not acquire a connection from DataSource – Cannot create PoolableConnectionFactory

The error is most likely that the windows service “SQL Server Browser” is disabled.

To fix the problem:

  • Start -> Control Panel -> Systems & Security -> Administrative tools -> Services
  • SQL Server Browser -> right click -> properties

image2014-10-30 19-17-34

  • Set Startup type to “Automatic”
  • Apply -> Start

image2014-10-30 19-19-16

Now you should be able to connect to SQL Server from your Java application!

May 2, 2014

SQL Server – Simple Recursive Query Example

Filed under: SQL,SQL Server — Brian @ 2:50 pm
Tags: , , , ,

Every once in a while I need to write a recursive query in SQL Server and I always forget the syntax so I have to track down a simple example to help me remember. However, if you are trying to write a recursive query for the first time, I find some of the examples online to be a little bit too complicated. So I wanted to post an example, but also give you the script to create the table and populate it with data so you can see how it works and try it yourself. In this post I use the common example of a table with countries, states, and cities and where we want to get a list of all cities in a single country. Enjoy!

Create a table called “Area”:

CREATE TABLE dbo.Area(
   AreaID int NOT NULL,
   AreaName varchar(100) NOT NULL,
   ParentAreaID int NULL,
   AreaType varchar(20) NOT NULL
CONSTRAINT PK_Area PRIMARY KEY CLUSTERED 
( AreaID ASC
) ON [PRIMARY])
GO

Add some “Area” data:

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(1, 'Canada', null, 'Country')

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(2, 'United States', null, 'Country')

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(3, 'Saskatchewan', 1, 'State')

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(4, 'Saskatoon', 3, 'City')

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(5, 'Florida', 2, 'State')

INSERT INTO dbo.Area(AreaID,AreaName,ParentAreaID,AreaType)
VALUES(6, 'Miami', 5, 'City')

If I do a select by AreaType “City”:

select * from dbo.Area
where AreaType = 'City'

I get both Saskatoon and Miami:

AreaID	AreaName	ParentAreaID	AreaType
4       Saskatoon       3            City
6       Miami           5            City

However, what if I wanted to return all cities in Canada?

You can accomplish this by doing a recursive select which uses a common table expression (CTE).

WITH AreasCTE AS 
( 
--anchor select, start with the country of Canada, which will be the root element for our search
SELECT AreaID, AreaName, ParentAreaID, AreaType
FROM dbo.Area 
WHERE AreaName = 'Canada'
UNION ALL 
--recursive select, recursive until you reach a leaf (an Area which is not a parent of any other area)
SELECT a.AreaID, a.AreaName, a.ParentAreaID, a.AreaType 
FROM dbo.Area a 
INNER JOIN AreasCTE s ON a.ParentAreaID = s.AreaID 
) 
--Now, you will have all Areas in Canada, so now let's filter by the AreaType "City"
SELECT * FROM AreasCTE  
where AreaType = 'City' 

Now we get back the following results for cities in Canada:

AreaID	AreaName	ParentAreaID	AreaType
4       Saskatoon       3               City

That’s it! Now we have written a simple recursive query!

March 6, 2014

SQL Server – Alter database in Single User mode to Multi User mode

Filed under: SQL,SQL Server 2005 — Brian @ 8:43 am
Tags: , , , ,

We have some test environment databases that get rebuilt nightly. The job that does the rebuild always switches the database into single user mode when it is being rebuilt. When the rebuild is finished it will switch it back to multi user mode. However, if there was a problem with a script and the job failed, it will leave the database in single user mode.

First, open a SQL Server Management Studio query window connected to database “master”.

The command to change the database back to multi user mode is:

ALTER DATABASE {InsertDatabaseNameHere} SET MULTI_USER WITH ROLLBACK IMMEDIATE
GO

However, if there is an existing process blocking this alter, you may get the error message:

“Transaction (Process ID 864) was deadlocked on lock resources with another process and has been chosen as the deadlock victim”

Since both the existing running process and the current request have the same deadlock priority, preference is given to the longer running process which is why your alter database command is chosen as the deadlock victim. So to deal with this when you run the command again, set the deadlock priority to HIGH. However, if that does not work, you will need to kill the existing connection.

To find the “spid” of the existing connection, you can use the following query:

SELECT sd.[name], sp.spid, sp.login_time, sp.loginame 
FROM sysprocesses sp 
INNER JOIN sysdatabases sd on sp.dbid = sd.dbid  
WHERE sd.[name] = 'DatabaseNameGoesHere'

As an alternative, you can also use the command “sp_who” to get the “spid” of the open connection:

exec sp_who

So, here is our revised command:

KILL SpidToKillGoesHere
GO
SET DEADLOCK_PRIORITY HIGH
GO
ALTER DATABASE DatabaseNameGoesHere SET MULTI_USER WITH ROLLBACK IMMEDIATE
GO

That should do it!

May 27, 2013

SQL Server – Check Index Fragmentation on ALL Indexes in a Database

Filed under: SQL,SQL Server — Brian @ 11:46 am
Tags: , , , ,

Often, when I am using a development or test environment and I run into a situation where a query is behaving slower than normal, first thing I want to rule out is “Do I have any fragmented indexes?”. Here is a simple query that will list every index on every table in your database, ordered by percentage of index fragmentation. This query will work on SQL2K5 or newer.

SELECT dbschemas.[name] as 'Schema', 
dbtables.[name] as 'Table', 
dbindexes.[name] as 'Index',
indexstats.alloc_unit_type_desc,
indexstats.avg_fragmentation_in_percent,
indexstats.page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS indexstats
INNER JOIN sys.tables dbtables on dbtables.[object_id] = indexstats.[object_id]
INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id]
INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]
AND indexstats.index_id = dbindexes.index_id
WHERE indexstats.database_id = DB_ID()
ORDER BY indexstats.avg_fragmentation_in_percent desc

That should do the trick!

May 27, 2012

C# – Bulk copying data into MS SQL Server with DataTables

Filed under: c#,SQL Server 2005 — Brian @ 9:45 pm
Tags: , , ,

Recently, we were asked to start pulling data daily from a number of sources (e.g., several REST APIs), aggregating the data, and saving it to a database to be used for generating reports. As usual we want to make sure the application is easy to test (We need to make sure those stats are correct!), but we also need to ensure it performs well because we will be adding possibly hundreds of thousands of rows daily to a number of different tables every time this job runs. We were worried that the bottleneck in this application would be in running all those insert statements against our MS SQL Server database. However, we were in luck, .NET has a handy feature called DataTables.

A DataTable is basically an in memory representation of an MS SQL Server table. DataTables allow you to create the table in memory, add rows to it, edit values in specific columns of a row, etc, until all the data is exactly what you want. Once the DataTable is ready, it is just a simple SqlBulkCopy statement to insert all the data at once. So rather than hundreds of thousands of insert statements, it is just one bulk copy, and rather than taking minutes or longer to run, it just takes seconds to dump all the data into MS SQL Server. Also, because the data is all in memory, it makes it very easy to test all of our stats. We simply pass in the data we would receive and assert on the values in the DataTables. That’s all!

The following is a simple example where we are saving daily sales figures for each sales person.

Create the table

CREATE TABLE [dbo].[DailySalesStats](
	[Date] [smalldatetime] NOT NULL,
	[SalesPersonId] [int] NOT NULL,
	[TotalSales] [int] NOT NULL,
 CONSTRAINT [PK_DailySalesStats] PRIMARY KEY CLUSTERED 
(
	[Date] ASC,
	[SalesPersonId] ASC
)) ON [PRIMARY]

Example of writing sales stats to the dbo.DailySalesStats table using a DataTable and SqlBulkCopy

using System;
using System.Data;
using System.Data.SqlClient;

namespace DataTableExample
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a datatable with three columns:
            DataTable dailySalesStats = new DataTable("DailySalesStats");

            // Create Column 1: Date
            DataColumn dateColumn = new DataColumn();
            dateColumn.DataType = Type.GetType("System.DateTime");
            dateColumn.ColumnName = "Date";

            // Create Column 2: SalesPersonId
            DataColumn salesPersonIdColumn = new DataColumn();
            salesPersonIdColumn.DataType = Type.GetType("System.Int32");
            salesPersonIdColumn.ColumnName = "SalesPersonId";

            // Create Column 3: TotalSales
            DataColumn totalSalesColumn = new DataColumn();
            totalSalesColumn.DataType = Type.GetType("System.Int32");
            totalSalesColumn.ColumnName = "TotalSales";

            // Add the columns to the dailySalesStats DataTable
            dailySalesStats.Columns.Add(dateColumn);
            dailySalesStats.Columns.Add(salesPersonIdColumn);
            dailySalesStats.Columns.Add(totalSalesColumn);

            // Let's populate the datatable with our stats, which could come from REST APIs, etc.
            // You can add as many rows as you want here!

            // Create a new row
            DataRow dailySalesStatsRow = dailySalesStats.NewRow();
            dailySalesStatsRow["Date"] = DateTime.Now.ToString("yyyy-MM-dd");
            dailySalesStatsRow["SalesPersonId"] = 1;
            dailySalesStatsRow["TotalSales"] = 2;

            // Add the row to the dailySalesStats DataTable
            dailySalesStats.Rows.Add(dailySalesStatsRow);

            // Copy the DataTable to SQL Server
            using(SqlConnection dbConnection = new SqlConnection("Data Source=dbhost;Initial Catalog=dbname;Integrated Security=SSPI;Connection Timeout=60;Min Pool Size=2;Max Pool Size=20;"))
            {
                dbConnection.Open();
                using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
                {
                    s.DestinationTableName = dailySalesStats.TableName;
                    foreach (var column in dailySalesStats.Columns)
                        s.ColumnMappings.Add(column.ToString(), column.ToString());
                    s.WriteToServer(dailySalesStats);
                }
            }

            // That's it, we are done!
        }
    }
}

The Results

select * from dbo.DailySalesStats

Date                SalesPersonId  TotalSales
2012-05-27 00:00:00	   1	        2

Overall I have been really happy with the DataTable and SqlBulkCopy features in .NET. It is a fast and easy way to load a lot of data into a MS SQL Server database and is a good fit for our statistics gathering job!

February 29, 2012

Scala – Connecting a Scala app to MS SQL Server with Squeryl

Filed under: Scala,SQL Server 2005 — Brian @ 9:21 pm
Tags: ,

For all of our Scala applications up until now, we have being using MongoDB. However, recently we needed to connect one of our Scala applications to Microsoft SQL Server database. One of my excellent co-workers pointed us at a really nice project called Squeryl which provides a very simple library for doing just that. The documentation is excellent and great examples are provided to help get you up and running quickly.

Let’s look at a simple example.

Create a table in your SQL Server database:

CREATE TABLE [dbo].[Customer](
	[id] [int] NOT NULL,
	[name] [varchar] (100) NOT NULL,
 CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED 
(
	[id] ASC
)) ON [PRIMARY]

Insert a row:

INSERT INTO dbo.Customer(id, name)
VALUES(1,'John Smith')

Get all of the libraries you will need to make the program run:

  • Download Squeryl jar (2.9.0)
  • Download jtds driver jar (1.2.5)
  • Download cglib jar (2.2.2)
  • Download asm jar (3.1)

Now, let’s run our test code to select a row from our Customer table:

package spikes

import org.squeryl.Session
import org.squeryl.adapters.MSSQLServer
import org.squeryl.SessionFactory
import org.squeryl.PrimitiveTypeMode._
import org.squeryl.Schema

// Create a Customer class which has the same fields as the Customer table in SQL Server
case class Customer(id: Long, name: String) {}

object SqlSpike extends App with Schema {
  val databaseConnectionUrl = "jdbc:jtds:sqlserver://myservername;DatabaseName=mydatabasename"
  val databaseUsername = "myusername"
  val databasePassword = "password"

  // Set the jtds driver
  Class.forName("net.sourceforge.jtds.jdbc.Driver")

  // Connect to the database
  SessionFactory.concreteFactory = Some(()=>
    Session.create(
      java.sql.DriverManager.getConnection(databaseConnectionUrl, databaseUsername, databasePassword),
      new MSSQLServer))

  // Setup the Customer class to be mapped to the "Customer" table in SQL Server
  val customers = table[Customer]("Customer")

  // Select Customer with id=1 from the Customer table
  transaction {
    val customer = customers.where(c=> c.id === 1).single
    println("Customer name: " + customer.name)
  }
}

Finally, the output of the program:

[info] Running spikes.SqlSpike 
Customer name: John Smith
[success] Total time: 4 s, completed 29-Feb-2012 8:42:24 PM

Overall we have found the Squeryl library to be very useful. If you need to connect a Scala application to a SQL database I recommend you give it a try, it should save you some aggravation.

October 24, 2009

Migrating Data to JSON In SQL Server

Filed under: SQL — Brian @ 10:44 pm
Tags: , ,

Recently I was working on a task that required migrating some existing data (Phone Numbers) from being stored as separate columns in a separate table (e.g., dbo.FriendPhoneNumbers), to being stored as a single JSON document in one column of the table containing all related data (e.g., dbo.Friend).

My task was only to complete a one-time migration, so I did not want to spend a lot of time writing a custom application to complete this work. Just for fun I decided to see if I could first accomplish this migration in SQL. I was also curious if such a SQL statement would be efficient enough to be run in a live production environment.

Using the SQL UPDATE statement below I was able to create 100,000+ JSON documents in just a few seconds.

UPDATE f
SET PhoneNumber = phoneJSON.JSON 
from dbo.Friend f 
join 
(
	select FriendID, REPLACE( 
		'{'+
		CASE WHEN Home IS NOT NULL THEN '"Home":"'+Home+'",' ELSE '' END +
		CASE WHEN Work IS NOT NULL THEN '"Work":"'+Work+'",' ELSE '' END +
		CASE WHEN Fax IS NOT NULL THEN '"Fax":"'+Fax+'",' ELSE '' END +
		CASE WHEN Cell IS NOT NULL THEN '"Cell":"'+Cell+'",' ELSE '' END
		+'}', ',}', '}') as 'JSON'
	from dbo.FriendPhoneNumbers
) as phoneJSON ON phoneJSON.FriendID = f.FriendID
GO

While writing the script I ran into two issues:

  1. Nullable Columns: Wrapped each column in a SQL CASE statement so when a phone number was null, nothing was added to the JSON
  2. Trailing Comma: Added a SQL REPLACE to replace “,}” with “}” to handle the case where the last phone number in the list was null

That was all! It took only a few minutes to write this script.

December 21, 2008

SQL Server Quirk – Index Being Ignored

Filed under: SQL Server 2005 — Brian @ 12:50 pm
Tags: , , ,

I recently ran into this case with an index. The table had an index on the column “Status”, however when querying on “Status” the SQL optimizer was deciding not to use the Index, even though this actually hurts performance. The issue was in checking multiple values of “Status” in the select:

Where Status=’Active’ – uses index
Where Status=’New’ – uses index
Where Status IN (‘Active’,’New’) – ignores index
Where Status=’Active’ OR Status=’New’ – ignores index

The interesting thing was that when this table had less than 50,000 rows it used the index, but after the table grew in size the SQL Optimizer decided that the query was nolonger “selective” enough to justify using the index. By ignoring the index the Optimizer lowered the CPU usage of the query but as a side effect increased disk IO (The Optimizer figured this solution was better overall). However, this caused the number of reads for a look up to be significantly higher (On average reading 500,000+ plus rows on every lookup) costing us a great deal of disk IO. For our system IO was the bottleneck (Not CPU) and ignoring this index caused significant production issues.

So the fix was to use a UNION ALL, and do two selects, rather than one select that checks two values (Yes you could also hard code the index usage into the select but that has its own problems as well). Note: the reason we use UNION ALL here is because it avoids the cost of having to check for any duplicate values in the results. In our case we do not have to worry about duplicates (no value can have both status A and N at the same time) so we can use UNION ALL to improve performance.

Select FirstName, LastName From dbo.Friends
Where Status='Active'
UNION ALL
Select FirstName, LastName From dbo.Friends
Where Status='New'

The Tipping Point

If you are interested in reading about the cases in which an index in SQL Server will no longer be used you can find some great information at sqlskills.com. Kimberly Tripp has an article explaining The Tipping Point.

Next Page »

Create a free website or blog at WordPress.com.