Matrix Rebuilt: More non-standard fun with T-SQL

I don't know about you but I'm not in love with the Matrix control in SQL RS.

I'm not in love with pivot tables in Excel, either, maybe it's me, I don't get my head around these constructs very well in general.  Other people manage. But, in RS, I just seem to spend too much time getting around the expectations of how things are supposed to work, which should make things easier but don't. 

(RANT-ASIDE: How many times have you seen a question about how to get subtotals before rather than after the rest of the columns, by the way?  Who thought up that green triangle?  It's a crowded interface, it's really hard to get all that functionality in, but still. OK, OK, I know it's easy to Monday-morning quarterback this stuff to death — I've been on the other side, it's maddening. So I'm not really complaining, and this rant does not constitute a fair criticism.)

So I like to do stuff in a table, where possible, even when I have to do a lot more work in T-SQL to get where the matrix would auto-magically put me.

As a result, even though I don't like pivot tables in Excel, I'm kind of falling in love with the PIVOT clause in SQL 2005. Like the PARTITION stuff I've already discussed, this one really opens up (you should pardon the expression) reporting vistas.

This puppy gives you the ability to rotate tables.  I figure that the matrix is actually using it, or something very similar to it with one exception which I'll discuss below, under the covers.  So why not do it yourself?

The critical difference is dynamic expression of the columns you want.  You've got to express your pivoted columns — twice, actually — within your SELECT statement.  Here is a an example PIVOT query from the docs:

USE AdventureWorks
GO
SELECT VendorID, [164] AS Emp1, [198] AS Emp2, [223] AS Emp3, [231] AS Emp4, [233] AS Emp5
FROM
(SELECT PurchaseOrderID, EmployeeID, VendorID
FROM Purchasing.PurchaseOrderHeader) p
PIVOT
(
COUNT (PurchaseOrderID)
FOR EmployeeID IN
( [164], [198], [223], [231], [233] )
) AS pvt
ORDER BY VendorID;

How can you do that when you don't know what the columns are?

I wrote a quick example of how to do this in T-SQL for a co-worker, unrelated to reporting.  It's actually easier to do if you write the query as SQL using VB, where there are much more capable string-handling expressions than what I'm going to show you here.

First you need to have some way to determine your max number of columns. In this example, I'll be setting up my buckets by month values in a datetime column, and using DATEDIFF() from a specified date to express the meaning of each bucket (literally "how many months ago did something happen"). So I'll use DATEDIFF() in a similar but much quicker query, to figure how many buckets I have, first.

SET NOCOUNT ON
DECLARE
@Today AS DATETIME
SET
@Today = GETDATE()
DECLARE @TotalCols AS INTEGER

DECLARE @ColName AS NVARCHAR(50)
DECLARE @Statement1 AS NVARCHAR(2000),
        @Statement2 AS NVARCHAR(2000)

/* do a query to find out your max number of colums */
/* You could do this differently if you don't always want
   all the data.
*/
SELECT @TotalCols =
   COUNT(DISTINCT DATEDIFF(month,EventDate,@Today))
   FROM EventTable

Next, you need to set up the two clauses that reference those columns.  You'll need a second variable (besides the total number of columns) that indicates some base column name that is meaningful within your data set, to which you will add an index or identifier.

/* decide how you want to name the cols 
   in this specific query */
SET @ColName = 'MonthsAgoGrp'

 

Now you can derive the two clauses you need.  This part is completely generic (and again it is much, much easier to write it in VB than in T-SQL although this should be reasonably clear).  You can move it into appropriate generic functions in either .NET code or T-SQL, passing in @ColName and @TotalCols, for any PIVOT-y query you wanted to assemble:

DECLARE @ThisItem AS INTEGER, @ThisPos AS INTEGER, 
        @sThisItem
AS NVARCHAR(10)
SET @Statement1 = '[0] AS [' + @ColName + '0]'
SET @Statement2 = '[0]'
SET @ThisItem = 1
WHILE @ThisItem < @TotalCols
   BEGIN
      SET
@sThisItem = CAST(@ThisItem AS NVARCHAR)
      SET @Statement1 = @Statement1 + ',[' +
         @sThisItem
+'] AS [' + @ColName + @sThisItem + ']'
      SET @Statement2 = @Statement2 + ',[' +
         @sThisItem
+']'
      SET @ThisItem = @ThisItem + 1
   END
/* You can print out @Statement1 and @Statement2 at this point,
   and verify that they are in fact the two clauses in the PIVOT
   statement that we needed to create differently for different
   numbers of columns
*/

Now you've got what you need to assemble the query and execute it: 

SET @Statement1 =
   N
' SELECT EventType, ' + @Statement1 +
   N
' FROM '
/* your pivot-able query here: */
 SET @Statement1 = @Statement1 +
   N
'(SELECT EventType,
      DATEDIFF(month,EventDate,'''
+
      CAST(@Today AS NVARCHAR) +
   N''') AS MonthsAgo, EventID 
   FROM EventTable ) SourceTable '

/* add the pivot aggregation function of your choice */
SET @Statement1 = @Statement1 +
   N' PIVOT ' +
   N
'(COUNT(EventID) FOR MonthsAgo IN (' 

/* our second generated clause here */
SET @Statement1 =
   @Statement1
+ @Statement2 + N')) AS PivotTable '

/* order the results if desired */
SET @Statement1 = @Statement1 + N' ORDER BY EventType '

/* end the statement  and execute it */
SET @Statement1 = @Statement1 + N';'
EXEC sp_executeSQL @Statement1

Your generated query for this example, the thing that executed, looked something like this (assuming @TotalCols = 5):

SELECT EventType,
   [0] AS MonthsAgoGrp0, [1] AS MonthsAgoGrp1,
   [2] AS MonthsAgoGrp2, [3] AS MonthsAgoGrp3, [4] AS MonthsAgoGrp4
FROM
(SELECT  EventType,
   DATEDIFF(month,EventDate,'8/17/2007')  AS MonthsAgo, EventID
  FROM EventTable )  SourceTable
PIVOT
(
  COUNT(EventID)
  FOR MonthsAgo IN 
  ([0],[1],[2],[3],[4])
) AS PivotTable
ORDER BY EventType ;

 

I figure that, underneath, the matrix has to be processing the data something like this; first figuring out the buckets and then designing either a PIVOT clause or something similar, with its dynamically-created column instructions, to execute on the data set with which you present it in the report. 

Now you know that, for reporting purposes, this isn't the end of the story — although it solves my co-worker's problem pretty neatly. You still have to tell the report about your dynamic structures, and where you have buckets that can't be indexed as easily as "months ago", you have to provide an index value to them somehow (this is another place where you will find ROW_NUMBER() and PARTITION helpful).

But that's enough for now, and I hope it gives you some ideas…

4 thoughts on “Matrix Rebuilt: More non-standard fun with T-SQL

Leave a Reply

Your email address will not be published. Required fields are marked *