If you're wondering what a PIVOT query is, allow me to explain in my best own words (which may not be much help). Sometimes, for reporting purposes, an end user will require a dataset to be displayed in a manner that makes more sense to humans more than it does to machines. Unfortunately, the way data is stored often only makes sense to machines. A classic example is when you have a dataset that requires some sort of aggregation on one column, while grouping by another column. This phenomenon creates the need for a PIVOT query, where the rows are rotated into columns.
Suppose you have the following dataset:
How would you accomplish this? And better yet, how would you accomplish it dynamically so that more dates can be added without changing the query? Let me show you:
1.
Let's first get our test data going. You can use this script to create and populate a table called "T_EMPLOYEE_PRODUCTIVITY":
CREATE TABLE T_EMPLOYEE_PRODUCTIVITY
(
EMPLOYEE_NAME VARCHAR(500),
DATE DATETIME,
UNITS INT
)
-- 7/20
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('JOE', '07-20-2013', 5)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('SALLY', '07-20-2013', 7)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('MAE', '07-20-2013', 3)
--7/21
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('JOE', '07-21-2013', 8)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('SALLY', '07-21-2013', 7)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('MAE', '07-21-2013', 4)
--7/22
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('JOE', '07-22-2013', 9)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('SALLY', '07-22-2013', 9)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('MAE', '07-22-2013', 2)
--7/23
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('JOE', '07-23-2013', 14)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('SALLY', '07-23-2013', 9)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('MAE', '07-23-2013', 2)
--7/24
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('JOE', '07-24-2013', 15)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('SALLY', '07-24-2013', 10)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('MAE', '07-24-2013', 4)
2.
You should now have a table that looks like my first screenshot above. Use this query to pull what I pulled:
You should now have a table that looks like my first screenshot above. Use this query to pull what I pulled:
SELECT *
FROM T_EMPLOYEE_PRODUCTIVITY
ORDER BY DATE, EMPLOYEE_NAME
3.
To do an ordinary PIVOT query, you would use the PIVOT relational operator. Microsoft defines the basic syntax for this operator as the following:
SELECT <non-pivoted column>,
[first pivoted column] AS <column name>,
[second pivoted column] AS <column name>,
...
[last pivoted column] AS <column name>
FROM
(<SELECT query that produces the data>)
AS <alias for the source query>
PIVOT
(
<aggregation function>(<column being aggregated>)
FOR
[<column that contains the values that will become column headers>]
IN ( [first pivoted column], [second pivoted column],
... [last pivoted column])
) AS <alias for the pivot table>
<optional ORDER BY clause>;
Easy enough, right? Let's make some sense of it by translating it to our dataset:
SELECT EMPLOYEE_NAME, [2013-07-20], [2013-07-21], [2013-07-22], [2013-07-23], [2013-07-24]
FROM
(
SELECT EMPLOYEE_NAME, DATE, UNITS
FROM T_EMPLOYEE_PRODUCTIVITY
) AS SourceTable
PIVOT
(
SUM(UNITS)
FOR DATE IN ([2013-07-20], [2013-07-21], [2013-07-22], [2013-07-23], [2013-07-24])
) AS PivotTable
ORDER BY EMPLOYEE_NAME
Notice how we've taken the unique date values as the columns to be pivoted, and we have aggregated (by summing) the UNITS columns for each date. The raw data is our "source" and the aggregation is our "pivot". When you run this, you should see something similar to my second screenshot above.
4.
Let's now improve on the query by making it more dynamic. What if somebody inserts a record that has some value for 07-25-2013? We want our query to be able to handle that automatically.
First off, for proof of concept, let's go ahead and insert some data for a new day:
--7/25
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('JOE', '07-25-2013', 20)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('SALLY', '07-25-2013', 15)
INSERT INTO T_EMPLOYEE_PRODUCTIVITY VALUES ('MAE', '07-25-2013', 8)
If you were to rerun your query here, you'd see that we're not pulling the data for that day. That's because we have statically defined our pivoted columns. In order to dynamically define these, we will need to use some Dynamic SQL trickery.
First off, the goal is to define the comma-separated pivoted columns. To do this, let's create an @cols variable to hold the string value, which we will later plug into our PIVOT query. Also, let's use a temp table (#TEMP) to grab the unique values for DATE (the column on which we're pivoting):
IF (OBJECT_ID('tempdb..#TEMP') is not null) DROP TABLE #TEMP
DECLARE @cols NVARCHAR(2000)
SELECT DISTINCT DATE
INTO #TEMP
FROM T_EMPLOYEE_PRODUCTIVITY
SELECT @cols = ISNULL(@cols + ',', '') + '[' + CONVERT(NVARCHAR, DATE) + ']'
FROM #TEMP
ORDER BY DATE
SELECT @cols
Okay so what did we do here? To start, we're making sure that the #TEMP table is dropped before we continue on. This will allow us to run this chunk of code as many times as we wish within the same instance of SQL Server.
Next, we're declaring our @cols variable, and pulling our unique DATE values into the #TEMP table.
For the final section, we're building out our list of columns. The ISNULL just makes sure we're not starting it off with a comma. The rest just builds a [], around the unique DATE values that we pulled into #TEMP. Also, we needed to CONVERT the DATE from a DATETIME to an NVARCHAR or else SQL Server will yell at us. This wouldn't be needed if your pivoted values are already NVARCHARs. When you select this with SELECT @cols, you should see something like this:
Next, we're declaring our @cols variable, and pulling our unique DATE values into the #TEMP table.
For the final section, we're building out our list of columns. The ISNULL just makes sure we're not starting it off with a comma. The rest just builds a [], around the unique DATE values that we pulled into #TEMP. Also, we needed to CONVERT the DATE from a DATETIME to an NVARCHAR or else SQL Server will yell at us. This wouldn't be needed if your pivoted values are already NVARCHARs. When you select this with SELECT @cols, you should see something like this:
Look familiar? Okay, let's continue on.
5.
From here, we need to build a dynamic query string and execute that. This will allow us to plug in the @cols string we built right before this.
DECLARE @query NVARCHAR(4000)
SET @query = 'SELECT EMPLOYEE_NAME, ' + @cols +
'FROM
(
SELECT EMPLOYEE_NAME, DATE, UNITS
FROM T_EMPLOYEE_PRODUCTIVITY
) AS SourceTable
PIVOT
(
SUM(UNITS)
FOR DATE IN ('+ @cols + ')
) AS PivotTable
ORDER BY EMPLOYEE_NAME'
SELECT @query
EXECUTE(@query)
So in this piece, we're just taking our @cols and plugging it in where we had originally statically defined the column names. Finally, just execute the @query variable, and you should see your pretty result set. Here is the whole script together, so you can run it all:
IF (OBJECT_ID('tempdb..#TEMP') is not null) DROP TABLE #TEMP
DECLARE @cols NVARCHAR(2000)
SELECT DISTINCT DATE
INTO #TEMP
FROM T_EMPLOYEE_PRODUCTIVITY
SELECT @cols = ISNULL(@cols + ',', '') + '[' + CONVERT(NVARCHAR, DATE) + ']'
FROM #TEMP
ORDER BY DATE
SELECT @cols
DECLARE @query NVARCHAR(4000)
SET @query = 'SELECT EMPLOYEE_NAME, ' + @cols +
'FROM
(
SELECT EMPLOYEE_NAME, DATE, UNITS
FROM T_EMPLOYEE_PRODUCTIVITY
) AS SourceTable
PIVOT
(
SUM(UNITS)
FOR DATE IN ('+ @cols + ')
) AS PivotTable
ORDER BY EMPLOYEE_NAME'
SELECT @query
EXECUTE(@query)
Voila! When you run it all together, you should see something like this:
Notice how in the end result, we've got July 25, 2013 now. As a test, you could go in and insert another day if you'd like. Additionally, you can add more values for an existing day for a particular employee and you'll see it get aggregated under the appropriate day of our pivot result. Of course in this script you don't have to select our @ variables throughout it, but I just have it in here to show you how everything is pieced together.
I hope all of this helps somebody in the future. It took me quite a bit of research and working with colleagues to figure this one out. Please let me know if you have any questions or problems in the comments below, and I'll be happy to try and help. For more information on the PIVOT operator, please visit http://msdn.microsoft.com/en-us/library/ms177410(v=sql.105).aspx.
Thanks for listening.