More Than Four

Areas of Interest, as counted by my cat

Page 6 of 12

From the “Look at me, I’m an idiot” department

If this was April 1st, I’d be sure that this was satire, but as it is, I’m not sure:

Java SE – Change in Version Numbering scheme in March 2013

In case this vanishes off the web, here’s the gist:

Seriously? I don’t know who is to blame for this loss of sanity. Oracle, you are not so damn important that you need to re-invent this. My understanding is that there is a well-established convention for numbering incremental versions of a product, that allows for such concepts as minor feature releases and bug fix releases.

The scheme is nicely described here on Stack Overflow: How to do version numbers.

What is sad is that back in the Sun days, Java release did follow this numbering convention.

Exploring AUTHID with an eye to partitioning client data

Problem

Is it possible to design a multi-tenant architecture in an Oracle DB instance, such that different customers’ data is transparently partitioned whilst keeping common code and common tables in a single location?

Possible Strategy

Consider the following architecture:

Here we have two schemas in an Oracle database instance:

  • U_Common contains common tables and stored procedures, packages, etc.
  • U_Cust_1 contains customer-specific tables, with private data.
  • Client processes acting on behalf of Customer_1 are connecting to the U_Cust_1 schema.

(You may assume we also have U_Cust_2, etc.)

Assume that all access is performed through stored procedures or similar code objects.
In order for sessions connecting to U_Cust_1 to execute specific procedures in U_Common, they need:

  • EXECUTE permissions on the objects in U_Common;
  • Either:
    o explicit namespace prefixes, e.g. U_Common.DoSomething()
    o a synonym object referencing the code object in U_Common

In order to have common code objects that reference both common tables and customer tables, we need to create “template” versions of the tables in the U_Common schema:

However, when U_Cust_1 invokes the procedure in U_Common, the session can’t “see” the customer-specific version of the table, because the code is executing in the context of U_Common. It will see the template table data (or lack of it – presumably there are no rows in the template tables).

This can be addressed by adjusting the AUTHID pragma when the code objects are defined:

Default:

  create or replace package U_Common [authid definer] is ..

Change to:

  create or replace package U_Common authid current_user is ..

According to the documentation, this AUTHID clause instructs Oracle on whether to run the code with the invoker’s rights, or the owner’s rights. (the default is DEFINER, or owner.)

In practice, this means that when U_Cust_1 executes the code object defined as AUTHID CURRENT_USER, the customer tables in U_Cust_1 are in scope, and are accessed instead of the template tables in the U_Common schema.

Two interesting observations about executing code defined as AUTHID CURRENT_USER:

  • the code does NOT see the Common tables. Any table in U_Common is “out of scope”;
  • the code can see other procedures and packages in U_Common, including those defined as AUTHID DEFINER (default);
  • when the code makes a nested call to those other procedures, the tables in U_Common are now visible!

Here’s a sequence diagram of the call stack:

This means that everything will work just fine, so long as the code objects are classified into “programs that access common data” and “programs that access customer data”.

Concerning Views

Views don’t work so well.

You can define a view in U_Common that joins a Common table with a Customer table, but any U_Common procedure that is authid current_user will not be able to “see” the view from the context of U_Cust_1.

You can try this:

grant SELECT on View to U_Cust_1
create synonym in U_Cust_1 for the U_Common.View

But the result set will be entirely pulled from the U_Common schema tables.

So this technique probably isn’t going to be used to create a transparently customer-partitioned application architecture. But it is still worth understanding how AUTHID works.

A nice enhancement in PL/SQL Developer .NEXT

I’ve been taking a look at the public beta of the next version of PL/SQL Developer. I’m currently using the 8.0 release, choosing not to upgrade to 9.0 when it was released last year. One of the new features Allround Automations added in 9.0 was the ability for one instance of the application to open sessions to multiple databases. I considered this a pretty risky enhancement.

However, in the upcoming 10.0 release, this multi-session feature seems to be refined and more solid. One nice enhancement is the ability to assign a window background color to specific connections.

Here, I am connecting to the QA, UAT, and PROD environments:

Now there is much less chance of accidently running a data update on the wrong DB! This feature would have been quite useful in 8.0 and earlier, frankly. It’s nice to have an obvious reminder of which environment you’re looking at.

Data Prospecting

Lisa and I drove into the city last Wednesday to attend the San Francisco SQL Server User Group meeting, to see a presentation of Microsoft’s latest Data Mining tools. It was a great presentation, very enlightening.

I confess, I’ve always been under a misconception about data mining. For reasons I’ll explain below, I’ve had a bias that “data mining” was all about exploring your data and looking for unexpected trends and relationships that perhaps could then be leveraged.

However, at the user group presentation, as William Brown took us through a data mining exercise, several things became clear to me:

In Data Mining projects, the following axioms are important:

  • Know what you’re looking for
  • Know how to recognize it when you’ve found it

William started with the following goal: “Of my database of customers who haven’t purchased a bicycle, which of them are likely to buy one?”

He then used the Microsoft toolset to create a “sieve” through which he could pour the customer database and build a collection of likely bicycle purchasers.

This was possible because he started with a table of customers known to be bicycle purchasers, and used a subset of it to effectively “train” the sieve to identify rows with similar properties. He also used the remainder of the known data set to test the sieve’s effectiveness.

This is almost the exact opposite of what I had presumed “Data Mining” was all about, but I stand corrected: “Data Mining” is an excellent description of the process.

You go into the project knowing what it is you want to find, and you have a test that can be used to prove if you are successful.

Digression: Arthur Ransome’s Pigeon Post

In the book Pigeon Post by Arthur Ransome, a team of child explorers decide to go prospecting for gold on the local hills. They base this decision on an old rumor of a lost gold seam up in the hills. Adventures ensue, of course. It’s a great book, with detailed descriptions of the prospecting, ore crushing, smelting, and charcoal burning processes. After an unsuccessful attempt to melt the gold dust into a nugget – it just oxides into a black lump – they know that just finding gold-colored ore isn’t enough: They need a test to prove the result to their uncle, Captain Flint, who has given up on fruitless treasure hunts:

“It probably wasn’t gold at all,” said Susan. “We ought to have made sure.”

Dick looked up suddenly. “There is one way we could,” he said. “We could try with aqua regia. You know, a chemical test. The book says gold dissolves in aqua regia. If it does, we’d know for sure.”

More ore is extracted, and later on Dick performs the test, working in Captain Flint’s workshop laboratory.

Dick’s tests show that the ore does indeed dissolve in aqua regia (a mixture of two strong acids), but his joy is short-lived when Captain Flint returns and reviews the results:

“What’s this? Chemistry?” asked Captain Flint.

“Gold,” said Dick.

“Gold? Don’t you go and get interested in the wretched stuff. I’ve sworn off. Had quite enough wasting time… What have you got in that test tube?”

Aqua regia,” said Dick. “And gold dust. And it’s gone all right. Dissolved! Gold in the aqua regia. I was just a bit afraid it might not be gold, after all.”

“But, my dear chap,” said Captain Flint. “Aqua regia will dissolve almost anything. The point about gold is that it won’t dissolve in anything else.”

Captain Flint examines the ore and, using his knowledge of the geology of the surrounding hills, comes up with a better test to determine what, exactly, the prospectors have found.

Captain Flint’s test reveals the ore to be copper pyrites, which in its way, turns out to be quite as good a discovery as gold would have been, and the adventure is a success after all.

There’s several insights we can pull from this story:

  • Choosing an appropriate “test for success” is critical.
  • Domain knowledge is very important, and the earlier you apply it, the better off you’ll be.

But here’s another thing: The prospectors weren’t looking for copper. They were looking for gold. And without some help, they weren’t ever going to understand what they had actually found.

This last point makes me think that there is something missing in the current Data Mining toolset.

Reductio ad memoria

Over a decade ago, I worked on a project with some truly kick-ass data visualization tools: Silicon Graphic’s MineSet. The project involved showing how a company could mix their own customer data with additional data sourced from a third-party, and throw it into a 3D visualization, allowing variation along five or more axes simultaneously.

For example, in addition to X, Y, and Z axis, we also had color, density, size of the “splat”, and probably some others that I can’t recall. The thrill of taking this 3D model and spinning it around various viewing angles to see correlations between data axes – in real-time, in a web browser! – was, frankly, mind-blowing at the time.

Now, in my opinion the project’s premise was flawed, because the nature of the third-party data didn’t really allow for a 1:1 join with the primary data set – it was more of an exercise in psychohistory.

But the tool was brilliant. It even had an Internet Explorer ActiveX viewer control for browsers.

One feature I would have liked to have seen was a “randomize axes” option, where you could cycle through the various available columns in your data set, mapping them to different aspects of the 3D view, and then spin the result around, looking for unexpected relationships.

Because it’s the unexpected that is truly insightful. I want to see the copper pyrites – or whatever else is in there – without having to explicitly test for it.

I don’t think this is Data Mining as it is understood today. But perhaps it might be called “Data Prospecting”.

Finally

A quick web search indicates that, in October 2003, the SGI MineSet product was licensed for development and sale to Purple Insight, a Gloucester, UK-based company that now appears to be defunct. This is a shame, because these visualization tools were unsurpassed by anything I’ve seen to date.

Lisa tells me that some of the underlying algorithms found their way into IBM’s Many Eyes project.

Note: By the way, all of Arthur Ransome‘s books about the young explorers are great, and I highly recommend them, for children of all ages.

Avoiding artificial inflation in SUM on a denormalized column value

Say you have a data model that looks like this:

In other words, time entries can be tagged against multiple tasks. (I know this seems wrong, but trust me, it made sense in the context of the system.)

The problem we had was that the complicated query behind the report of “total hours by task” was counting the hours twice or more, for any entry tagged to more than one task.

[code:c#]select T.NAME, sum(L.HOURS)
from time_x_task X
join timelog L on L.ID=X.TIME_ID
join task T on T.ID=X.TASK_ID
group by T.NAME; [/code]

After thinking about how I could alter the query so that each timelog.HOURS was included only once in the SUM(), it occurred to me that it might be better to just distribute the hours across the tasks:

[code:c#]select L.ID, T.NAME, L.HOURS, L2.cnt, (L.HOURS/L2.CNT) as ACTUAL_HRS
from time_x_task X
join task T on T.ID=X.TASK_ID
join timelog L on L.ID=X.TIME_ID
join (select count(1) as CNT, TIME_ID
from time_x_task
group by TIME_ID) L2 on L2.TIME_ID=L.ID;[/code]

Now we SUM(ACTUAL_HRS) instead of sum(HOURS).

OK, this is a contrived example, but the point I’m trying to make is that sometimes you have to throw away the original question and think about what you’re really trying to achieve.

A light-weight SQL editor plugin for Eclipse IDE

Eclipse is not my primary IDE, but I do find it very useful for editing ANT project definitions, especially the outline view.

I use generic projects in Eclipse Juno 4.2 to manage collections of DDL scripts, but I did miss SQL syntax highlighting. Well, no more!

I found this plugin on the Internet: http://blog.debertshaeuser.com/index.php?page=eclipse-sql-editor.

Just download the jar and put it in the eclipse/plugins folder, then re-start Eclipse. It works great:

Long columns wrapping in the SQL Result Grid

Have you ever experienced a moment when you find that an application you’ve used for years has a feature that, if only you’d known about it, would have saved you time over and over again? It’s a mixed feeling… one one hand, it’s “Yay, this software is awesome!” and the other it’s “Doh! I’m an idiot!”.

Well, PL/SQL Developer just handed me another of those moments.

In the result grid of the SQL Query window, you can resize any row by grabbing the separator line in the margin and pulling down. I think I knew that… but what didn’t realize is that any column with a long string will then wrap into the extra space, like this:

I think that is pretty cool. And it’s been in the product since at least version 8 (which is what I’ve used for the last few years).

I put on my robe and prophet hat

Six months ago, I went on record* as saying:

In eighteen months, no-one will be talking about “the Cloud”.

I may have to re-qualify that statement:

If your business plan has the word “cloud” in it, you’re doing it wrong.

The point here is that “cloud computing” is not Web 3.0, or a revolution in IT architecture, but much more about economics and risk.

As usual, Bob Lewis says it better and is more deserving of my Prophet Hat.

* ok, maybe it isn’t written down anywhere. But I said it out loud at least once.

« Older posts Newer posts »

© 2025 More Than Four

Theme by Anders NorenUp ↑