Writing Dynamic SQL in and for RDLs

I know, I know, it's a boring title. But it's a big, exciting, and important subject — and I've now promised people in three different RS threads that I would write about it.

So here goes.

Why do it?

There are a lot of reasons to bind some querying logic very tightly to the report it runs. I said "some".  I know there are a lot of reasons, and a lot of cases, where it's a better idea to bind it tightly to the database instead. But consider these possibilities:

  • Report designer personnel may not have direct access to all the source databases.  The report designer is given a model or a set of views to which the report will have access, and that's it.  It's possible to do a great deal of manipulation on top of those views, but s/he has to put that fancy code somewhere.
  • The fancy report logic may be very tightly coupled with some expressions and behavior (say, visibility or aggregation) that only affect one report.  In such a case it is often better to bind this logic to the RDL, knowing that it is not exposed elsewhere. If the report requirements change you can guarantee that you can do it without side effects in anybody else's reports or database behavior.
  • Building some of the dynamic logic for a reporting query can make good use of VB functionality not found natively in SQL.  (You'll find a typical example in this post, by the way.) Sure, you can write a Split function in T-SQL (I have, everybody has) and sure you can allow CLR integrated code in SQL Server 2005.  But what if one of your sources isn't SQL Server 2005?

How can it be safe?

The first thing that everyone says when you start talking about writing dynamic sql is "security risk — SQL injection".  And everyone is right to be concerned. 

In the RDL context, very often the submissions from the user come from a strictly controlled (by you, the developer) set of options, displayed in radio buttons or drop-down controls; a user isn't allowed to submit straight text. 

Sometimes, yes, a text parameter is used as a filter. Or programmatic access via URL Access or SOAP permits the submission of values that are not so strictly controlled. But I think you'll see that the way I'm constructing the SQL (and the way the parameters submitted are deconstructed in the process)  it is extremely easy to test the contents of the user-submitted text in ways that obviate the possibility of SQL injection. 

If you think otherwise when you see what I'm doing with the parameter values herein, please speak up and provide an example of how straight SQL could be injected and actually executed. I'll want to refine the way I'm testing the submissions, of course.

In the meantime, I'll also point out that I use this technique on intranet sites with integrated Windows authentication.  Report access is typically very tightly controlled (in fact, in some cases the requests are being submitted only by other applications that we've written). If somebody is on these intranets with appropriate network access to submit the parameters, s/he can do a lot more damage elsewhere already.

If you did (say) have an internet-accessible site that (for example) permits a user to send you a list of fields to be SELECTed or an ORDER BY clause as text parameters, so that you were just stringing the clauses together, you could still validate what you got. You could still add tests for keywords within the user-submitted text, and you could still restrict the allowable fields and values within the text to a known set of column names and expressions.  It is still possible to do this safely.

How do I do it?

This is the "walkthrough" part.

Let's take a simple case: You have a report query that shows orders, and one column represents a status value.  Your users would like to filter the list by status value.  Of course you could add a filter to the data region, but then you've pulled all the orders and status values across the network when you only need some of them.  It would be better to put the filter on the query, wouldn't it?  This filter value is optional; they can still show all orders if they want to.

The original query looked like this:

SELECT OrderNo, OrderStatus FROM MyOrdersTable

You build a dropdown showing the available order status values, and you also include the value 'Any'. (If the dropdown values are query-driven, use a UNION with the explicit 'Any' value.)

You want a query that looks like the above, if 'Any' is chosen, but you want a WHERE clause if a status value is specified. 

As a first step, realize that the query in the little Dataset-defining window can be an expression instead of a literal.  The following expression is a valid representation of the one you have already written:

="SELECT OrderNo, OrderStatus FROM MyOrdersTable "

Therefore, it follows that you can write a more complex expression with some VB functionality defining the string, such as this:

="SELECT OrderNo, OrderStatus FROM MyOrdersTable "  &
   IIF(Parameters!SelectedStatus.Value = "Any", " ",
       " WHERE OrderStatus =  '" & Parameters!SelectedStatus.Value  & "' " )

So there's your first dynamic SQL statement.  Notice that we need to delimit the value of the parameter, which will be compared as a literal against the column in question, as we build up the string.

Now suppose you need to change your SelectedStatus parameter to be multi-valued.  You probably want an IN list in the SQL result.

As you may know, and if you don't I've spelled it out in another post about report parameters, multi-valued params are an array of strings.  In that particular post we were receiving the multiple values as a comma-delimited string, and we used the Split function to separate out the values into the string array. 

In this scenario, we have to do the opposite: we have the string array and we need to concatenate that set of values together so it can be read in the resulting SQL.  We use the Join function to do that.  The function allows us to add whatever delimiter(s) we want, and in this case we'll add delimiter quotation marks for each literal value along with the commas that will separate them in the IN list.  Like so:

="SELECT OrderNo, OrderStatus FROM MyOrdersTable "  &
   IIF(Parameters!SelectedStatus.Value = "Any", " ",
       " WHERE OrderStatus IN  ('" & Join(Parameters!SelectedStatus.Value,"','")  & "') " )

I've marked the delimiter values in red and yellow so you don't miss them. Be sure you don't add any delimiting spaces that you don't want in the literal values that will appear in the SQL result.

I've included this particular example because it provides more types of confusion than just about any other form of RS-Dynamic SQL on the forums. Reference this thread.

Now you're getting the idea, right?  You can see that you might also want to add a  Switch or Choose instead of the IIF() to handle the possibility that Parameters!SelectedStatus.Count = 1, if you tested and found that a straight = comparison performed better than the IN clause.

How do I do it more flexibly?  

Sooner or later, you are going to get fed up doing this kind of work in the little Dataset definition window.  It will be "sooner" if you have lots of parameters for each report and you run out of room (that little window has a limit).  It will be "eventually" when you realize how hard those statements are to maintain, especially if you need to do lots of tests for each clause before you decide what the dynamic SQL statement should be.

At that point, you're ready to do it properly.  You can start by copying that long statement you wrote to Visual Notepad or someplace else ultra-safe, so you don't lose it.

Now replace the statement with a call to a code function, like this, which you'll write to return the SQL string:

 =Code.SQLStatementForThisDataSet(Parameters)

Notice I've passed the report parameters object to my function (and no, you don't have to call it  SQLStatementForThisDataSet, sheesh, and yes, you can pass the name of the dataset if you want one function to define more than one SQL statement).  Since we were evaluating the parameters to create our SQL statement before, it stands to reason we're going to need them now.

Start writing your function by going into the Report Properties window (from the Report menu popup ) and choosing the Code tab.  You'll need a function skeleton that looks something like this:

Public Function SQLStatementForThisDataSet(ByVal p as Parameters) As String
  Return ""
End Function

Now things start getting interesting. I bet you have some idea of what we do in here, right?  But I'll give you some concrete examples. 

Define "interesting". 

You can write really smart, generic code in here. For example, I happen to have a lot of reports that feature the ability to create "start date" and "end dates" as custom periods, with an override if they just want a "month to date" report. In some cases there is more than one date pair to be considered.  Typically my code for such a report will look something like this:

Public  testText As String =""
' this item will be explained below! 

Public Sub PrepareSQLWithDateStringParam(ByVal v AS String,  ByVal n AS String, ByVal sb As System.Text.StringBuilder, ByVal start As Boolean)
    If Len(v) > 0 AndAlso IsDate(v) Then
       If sb.Length > 0 Then
          sb.Append(" AND ")
       End If
       If start then
          sb.Append(n & ">= '" & v & "' ")
       Else
          sb.Append(n & "<='" & v & "' ")
       End If
   End If
End Sub

Public Function SQLStatementForThisDataSet(ByVal p as Parameters) As String
  Dim sb As  System.Text.StringBuilder = New System.Text.StringBuilder()
  Dim dummy As String
  If p("MonthToDate").value Then
     dummy = Month(Today).ToString()  & "/1/" & Year(Today).ToString()
     PrepareSQLWithDateStringParam(dummy,"SurveyDate",sb,true)
  Else
     If Not IsNothing(p("EarliestDate"))  Then
        dummy = p("EarliestDate").value
         PrepareSQLWithDateStringParam(dummy,"SurveyDate",sb, true)
     End If
     If Not IsNothing(p("LatestDate"))  Then
        dummy = p("LatestDate").value
         PrepareSQLWithDateStringParam(dummy,"SurveyDate",sb, false)
     End If
  End If

  If Not IsNothing(p("EarliestRunDate"))  Then
     dummy = p("EarliestRunDate").value
      PrepareSQLWithDateStringParam(dummy,"RunDate",sb, true)
  End If

  If Not IsNothing(p("LatestRunDate"))  Then
     dummy = p("LatestRunDate").value
      PrepareSQLWithDateStringParam(dummy,"RunDate",sb, false)
  End If

  If sb.length > 0 Then
     sb.Insert(0," WHERE ",1)
  End If

  sb.Insert( 0," SELECT …",1)
 ' leaving out the rest of the SELECT, the important
 ' thing is that I'm inserting it before my now-fully-formed
 ' WHERE clause

  sb.Append(" ORDER BY …  ") 
 ' leaving out the ORDER clause, ditto…

  testText = sb.ToString()
' this item will be explained below! 

' and, finally, here's the payload: 
  RETURN  sb.ToString()

End Function

Get the picture?  Here's another example, that shows you a multi-valued parameter at work in this type of code.  In this example, the values are of Integer rather than String type, so there are no extra delimiter marks, but you can still see the Join function at work in this, much clearer, context:

Public Sub PrepareSQLWithIntegerParam(ByVal v AS Parameter,  ByVal n AS String, ByVal sb As System.Text.StringBuilder,ByVal omitVal As String)
  If Not(IsNothing(v)) Then
    Dim thisVal AS String = Join(v.Value,",")
    If  (IsNothing(omitVal)  OrElse NOT(thisVal.Contains(omitVal)) )  Then
        If sb.Length > 0 Then
          sb.Append(" AND ")
        End If
        sb.Append(n & " ")
        If thisVal.Contains(",") Then
           sb.Append(" IN (" & thisVal & ") ")
        Else
          sb.Append("=" & thisVal &" ")
        End If
    End If
  End If
End Sub

You may be wondering what the "omitVal" information is doing in this function.  I want to allow for an "Any" case in the multi-select values, which will mean that this clause does not get added to the filter at all.  I add "Any" to the labels available in the dropdown, and give the "Any" choice an impossible value, usually -2.  When this value is chosen, it means that the user wishes to skip the filter entirely. The calling function that is constructing the SQL statement invokes this function as follows:

  If Not IsNothing(p("Q1Answer")) Then
      PrepareSQLWithIntegerParam(p("Q1Answer"), "Q1",sb,"-2")
  End If

  If Not IsNothing(p("Q2Answer")) Then
      PrepareSQLWithIntegerParam(p("Q2Answer"), "Q2",sb,"-2")
  End If

  If Not IsNothing(p("Q3Answer")) Then
      PrepareSQLWithIntegerParam(p("Q3Answer"), "Q3",sb,"-2")
  End If

How do I test it?

You've noticed the "testText" String variable I've added to the code.  I often have a "debug" layout control in my RDL, which is either Hidden based on a Debug variable or removed after report design is complete.  Here's a typical report expression for a "debug" textbox:

="DEBUG: " & User!UserID & " " & Code.TestText

In the early stages of report design, my SQL statement may not even be working as I have constructed with it.  I create a temporary SELECT statement in my Dataset window, providing some stubbed-in contents for the report without all the filters and other dynamic stuff. (You may have to do this anyway to get the right fields list for the data set; see the next section below.)  Meanwhile, the debug textbox shows me what I've constructed for the "real" SQL.  Usually I've forgotten to add a space between keywords, or something obvious.  But, if I don't spot what's wrong immediately, I can copy it out of an export from the report, paste it into a query window, and execute it interactively to find the bug.

How do I get the the right information in my Datasets list ?

If you started this process with a non-dynamic query and you haven't changed your output fields list, you're already all set.  If not, you can create a temporary query using literals, for example:

SELECT   10 AS OrderNo, 'Pending' As OrderStatus

… execute that query once in the Dataset window, press the little refresh button, and you should be all synched.  If not, you can right-click in the Datasets window to explicitly Add fields by name.

How do users understand the results?

A typical problem with "Exported" reports that have a lot of parameters is: What does the data represent?  Users forget what they selected in the report parameters, and how those choices related to the filtered and aggregated data values.

Luckily, you already have the tools you need to help them!

Along with the debug textbox we've already discussed, you can build up a "title" expression, to display users choices directly in the report.  All it takes is a second StringBuilder instance… the logic is already there for you, and the same functions can construct both the SQL version and the English language version at the same time.