The fault lies not in our snowflakes – PowerBI RLS implementation

Somebody asked me about RLS in PowerBI a couple of weeks ago, and I was interested enough to poke around a bit.  He ended up not being interested in seeing what I came up with, but I was happy enough with it to want to write it up, for myself and others, here.

As usual, I worked starting with some SSRS assumptions which didn’t turn out to be helpful (mainly because in SSRS the problem is a trivial one, using cascading parameters, whether hidden or visible — if anybody has a problem solving this, let me know and I’ll illustrate to your specifications). 

And, as usual, I work best with some real-life constraints, which in this case were whether I could surmount some of the issues and limitations that my colleague had demonstrated to me, in his business case.

The devilish details

In his business case, one of the requirements was assignment of user privileges along multiple axes, such as organizational structure and geographic location. (Joe can only see Production data, and his territory is Canada. Bill can only see Sales and Marketing data, and his territory is Australia.)

In BI of any sort, this requirement is typically expressed as filters on multiple dimensions. Each dimension that requires restriction gets its own set of cross-references with another dimension, often called Person, which represents the identities that are allowed access to the data, or the report, at all.

His proposed implementation involved creating a Role for each group of tables that were in one subject domain. For example, Production data might include Machine Types as well as lists of Factories, while Sales and Marketing might include Commission scales along with assigned Territories. (I’m out on a limb here, because I really don’t know the data) . But he indicated that there were “unpredictable results” if dimension needed to be in filtered in multiple domains — for example, a Product SKU has both a Bill of Materials (Production) and a Commission scale, so the SKU dimension might be needed in both Roles. He described some complex code that would be needed to make this work.

A second requirement is, of course, the cross reference data between people and sliced dimensions, which either needs to be maintained by somebody or deduced from existing data somewhere. Since there wasn’t any way to infer it from existing data, this looked like a pretty tedious chore until an application could be built to manage it.

What made the chore especially tedious was the third requirement, which is a pretty common one: some people have to be able to see an entire dimension, and a few people have to be able to see all the dimensions, unsliced. So, given the code I saw, somebody was going to have to create a whole lot of rows, each covering permissions to just one dimensional value, to give each one of these people access to everything.

I say “given the code I saw”. I grant you that this is a pretty easy problem to solve. It wasn’t accounted for in what I saw, and I just wanted to ensure that my version included this constraint in a sensible way.

Data for this walkthrough

I’m using my usual SQL Server adaptation of the standard MySQL World database — useful here partly because there is going to be a many-to many relationship, given how many countries use the same languages. I wanted to make sure I didn’t ignore whatever problems that might introduce. So, the original simple schema looks something like this:

Sample schema before adding RLS

Then I added some security tables into my sample PBIX to represent the security requirements, with the implementation and sample data as follows:

  • There are three users in a Person entity. They have DisplayNames and LoginNames, just because, in a real world, they would.
  • Bob is a superuser, and gets to see all Continents and all Languages. Rather than create a separate mechanism for this for my example, I created a couple more columns, AllLanguages, and AllContinents, as booleans, in my Person entity, both true for Bob and false for the other users.
    You could certainly have an “All Privileges” table that would work differently in your underlying data. But, regardless of how this was managed in the base SQL tables or other datasources, on the PowerBI dataset level, I think I do like this approach (a boolean evaluated value), a lot, and I’d probably retain it. Your mileage will vary, but it’s cost-effective to do it this way IMHO, and you’ll see below that it makes the code even more simple than it otherwise would have been.
  • There is one cross reference entity for Person-Continent. Lisa gets to see Europe, and Carlos gets to see North and South America. (Bob, again, gets everything.)
  • There is a second cross reference entity for Person-CountryLanguage. Lisa speaks French and English, and Carlos speaks Spanish and English. (Bob, ditto.)

If I had other dimensions in my example, I’d have other such cross reference entities. Bob might or might not be a superuser for any combination of them.

Could this be a lot of (skinny, lightweight) tables? Sure. On the SQL or other datasource level, you can certainly combine them all in something that looks like this, if you want, and if it makes user management more effective. Personally, I think that the single table will make user management less straightforward, especially if (as appeared to be the case in the business question with which I was presented) this was all going to be in an Excel sheet and there might be different expert users filling out the rules for each dimension:

LoginAllowed ValueValue Type
CarlosSouth AmericaContinent
LisaEuropeContinent
CarlosSpanishLanguage

But, once again, no matter how the data is physically stored, on the PowerBI level, it really makes sense to simply create filtered, separate entities for each cross reference, given the way relationships work.

The result looks something like this:

OMG it’s not a star anymore.

Snowflakes in your dataset: just get over it

PowerBI people, and even some SSAS people, seem to get very antsy when a reporting dataset can’t be a simple star schema.

There is no better example of why you shouldn’t be afraid of a snowflake melting all over your data than security requirements. Typically, describing security properly in a schema will require a little constellation of tables, in a star all of their own. So what?

You’ll notice I haven’t done anything fancy here with regard to bidirectional relationships, in the cross reference added entities. I wanted to impact the “main” star portion of the schema as little as possible, because, yes, PowerBI can get pissy in a complex schema. Again, so what?

OK, what’s next?

We’ve described the underlying data that matches our requirements to the dataset. Now that PowerBI knows what we know, we use RLS to leverage what we know.

This part is really simple. (And a lot simpler than what my colleague showed me.)

You only need one Role.

Everybody who has rights to see the report is members of that one Role. I called my one role Users, because, why not.

There’s a definite possibility that somebody could somehow get access to the report, who doesn’t have rights to anything in it. Sure, there’s the question of to whom you gave permission within your PowerBI service app, or workspace — but maybe you gave permission and their rights aren’t set up in the table yet. You don’t want them seeing everything, simply because user maintenance is a little behind! As you’ll see, this is easy to account for.

I started by creating a measure, called CurrentUser, since I was going to need it all over the place. The actual parsing or code you’d use here depends on how your Login values are stored.

In my case, I wanted it to work whether I was in the designer and receiving a domain-prefixed name as UserPrincipalName() or using the Power BI service, and I created Login names that didn’t have the domain-prefix or the email-suffix you get from the Cloud logins. So, I used this code to remove the domain-prefix. In your case, you might also parse out the email-suffix, or choose to store the full email value, in your Person table:

CurrentUser = 
     Mid(UserPrincipalName(),
         Search("\",UserPrincipalName(),1,0) + 1,100)

Now we know who need to be provisioned with permissions.

In the Users role, I set up two kinds of rules:

  1. One is on my Person table. I want to filter the Person table to the CurrentUser, which is certainly easy enough:

I’ve doctored the image so you can read the expression, simple as it is, because I did want you to see that I am not filtering on the cross reference tables, just the significant dimensions.

2. The other kind of rule, therefore, goes on the dimension tables, obviously. And they are all pretty much the same.

There are four lines of code.

The first two lines evaluate the major condition of “who is logged on and do they even belong here”.

// we will use this value more than once, so let's store it up-front
var thisUser = [CurrentUser] 

// we will use this value more than once, so let's store it up-front
var unknownUser = 
      (CountRows(CalculateTable(Person, Person[Login] = thisUser)) = 0)

Now we know if they don’t belong at all. The next question is: do they have rights to everything in this dimension? We can do this the same way every time, except for swapping out the column that holds the answer we want:

var thisUserGetsAll = 
     if(unknownUser, false(), 
        LookupValue(Person[AllContinents],
                    Person[Login], 
                    thisUser))

Notice that we leverage the unknownUser value, which we already have, up-front in the If() function. There’s no reason to go and do a Lookup, no matter how efficient, if we can short-circuit the need to do so.

Finally, in our fourth statement, we are able to make a determination (true or false) of whether to allow the current row into the filter for this person. As before, it is the same for each dimension except we will change the name of the related cross reference table and point to that table’s Login value, for each filter. As before, we short-circuit the need to perform any calculations as much as possible, by leveraging both of the boolean values we have already created, and then, if we need to, filter the related cross reference table for the current user, and check and see if we get any rows back:

return if(
 thisUserGetsAll,true(), 
 if(unknownUser, false(),
  CountRows(CalculateTable(RelatedTable
 (xRefPersonContinent),xRefPersonContinent[Login] =
  thisUser)) > 0))

I chose to use a comparison of > 0, rather than = 1, here because there’s every chance that user maintenance will occur, unvalidated, in an Excel spreadsheet somewhere, and a user might be in the cross reference table with the same value more than once. Sigh. And I used a nested pair of if() functions instead of a switch(), frankly, because (as is often the case with me & switch() in DAX) I couldn’t get switch() to work the way I wanted. Sigh again.

That’s it! Four lines of very simple, performant code and there shouldn’t be any side effects from multiple filters, because they’re all in one Role and they’re all doing what they are supposed to do.

And there’s nothing special you need to do, to get this to work, or to play nicely with other filters within the report.
  • Bob-the-SuperUser starts out with everything, but if he clicks on a Continent, the other lists will filter accordingly and as you would expect;
  • Mr. Unknown can’t see anything, not even a greeting, because he doesn’t have a row in the Person table. I suppose with a bit more work I could have replaced that greeting with instructions about contacting an Admin, for unknown users;
  • Carlos starts out with only his two continents, his two languages, and the countries that are connected to both of those restrictions.
  • Same with me; when I stop viewing As Other User, I’m recognized by my login name (which is not Lisa), and I see only countries in Europe and only French and English. For example, I see France, but not Canada, although French is spoken in Canada, and I don’t see Germany because French and English are not listed as languages for Germany, even though it’s in Europe. If I click on Ireland, my language list drops to English only.

This is not only not unpredictable, it’s exactly as designed — and if it weren’t, it would mean that my main schema’s relationships were not correct for this report. I could alter those without in any way affecting the efficacy of my security scheme.

And that’s what a special little snowflake can do for you.