pink cylinders

More than Four

...there's an axis for that.

An exercise in Interval Partitioning

Consider a table, LOG_DATA, with lots of rows:

select trunc( LOG_TIME, 'MM' ) as MONTH, count(1) as Count
from Log_Data
group by trunc(LOG_TIME,'MM')
order by trunc(LOG_TIME,'MM');

image

Perhaps this would be better served by a table with RANGE partitioning, using the INTERVAL syntax of 11g to automatically create partitions as data accumulates?

Unfortunately, LOG_TIME is data type timestamp with time zone, and we can't create an interval partition on timestamp columns (yet!)

Also, we can't use an expression in the interval definition. So, we need a DATE column:

create table Log_Data_P
(
ID integer,
PARENT_ID integer,
LOG_TIME timestamp with time zone default systimestamp,
STACK_LEVEL integer,
PACKAGE_NAME varchar2(30),
PROCEDURE_NAME varchar2(30),
LOG_LABEL varchar2(100),
LOG_VALUE varchar2(1000),
LOG_DURATION number,
PROC_DURATION number,
STEP_DURATION number,
LOG_CLOSED integer,LOG_DATE date default sysdate
)
partition by range ( LOG_DATE )
interval ( numtoyminterval( 1, 'MONTH' ) )
(
partition part_01 values less than ( date '2016-05-01' )
);

Here, we have added a LOG_DATE column to the end of the table structure. I've given it a default value and checked the code that writes to the table, to ensure there are no conflicts (there aren't).

Now let's insert into the new table, with the rows from the original (spanning several months):

insert into Log_Data_P
select
ID, PARENT_ID, LOG_TIME, STACK_LEVEL, PACKAGE_NAME, PROCEDURE_NAME, LOG_LABEL, LOG_VALUE,
LOG_DURATION, PROC_DURATION, STEP_DURATION, LOG_CLOSED, cast( LOG_TIME as DATE )
from Log_Data;
commit;

No errors... let's see what partitions we have, at the end of this process:

select PARTITION_NAME, INTERVAL, HIGH_VALUE
from DBA_Tab_Partitions
where TABLE_NAME = 'LOG_DATA_P'
order by PARTITION_POSITION;

PARTITION_NAME INTERVAL HIGH_VALUE
------------------ ---------- -----------------------------------------------------------
PART_01 NO TO_DATE(' 2016-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15277 YES TO_DATE(' 2016-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15278 YES TO_DATE(' 2016-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15280 YES TO_DATE(' 2016-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15281 YES TO_DATE(' 2016-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15279 YES TO_DATE(' 2016-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15282 YES TO_DATE(' 2016-11-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )

7 rows selected.

Interesting... we have 6 new partitions over and above the original "PART_01" partition that we created. Obviously, Oracle has allocated some automatically-generated names to the partitions.

OK, so now the time has come to perform some pruning on our log table. Let's delete the oldest partition:

SQL> alter table Log_Data_P drop partition PART_01;

ORA-14758: Last partition in the range section cannot be dropped

Uh-oh. It's not as simple as we'd hoped. More about dropping partitions in this excellent post here:

Follow up #1 - OK, so try something sneaky, and drop the second-to-last partition

I had an idea. What if we define the table with an artificially small initial partition, with a start date that is earlier than any data in the table?

If we take a look at our Log_Data table today:

SQL> select count(1), min(LOG_TIME) from Log_Data;

COUNT(1) MIN(LOG_TIME)
---------- -----------------------------------
1172660 12-OCT-16 10.57.17.516528 PM -05:00

1.1 million rows, since October 12.

Now let's define the Log_Data_P table again, with a start date of 2016-05-01 as described above, and insert the rows from our Log_Data table.

What do the partitions look like now?

select PARTITION_NAME, INTERVAL, HIGH_VALUE
from DBA_Tab_Partitions
where TABLE_NAME = 'LOG_DATA_P'
order by PARTITION_POSITION;

PARTITION_NAME INTERVAL HIGH_VALUE
---------------- -------- -------------------------------------
PART_01 NO TO_DATE(' 2016-05-01 00:00:00', ...
SYS_P17402 YES TO_DATE(' 2016-11-01 00:00:00', ...
SYS_P17403 YES TO_DATE(' 2016-12-01 00:00:00', ...
SYS_P17404 YES TO_DATE(' 2017-01-01 00:00:00', ...

"PART_01" won't have any rows in it, but it defines the interval for future partitions.  Now imagine some time goes by, and we don't want the oldest month of data... so, let's drop October, 'SYS_P17402':

SQL> alter table Log_Data_P drop partition SYS_P17402;

Table altered

select PARTITION_NAME, INTERVAL, HIGH_VALUE
from DBA_Tab_Partitions
where TABLE_NAME = 'LOG_DATA_P'
order by PARTITION_POSITION;

PARTITION_NAME INTERVAL HIGH_VALUE
---------------- -------- ------------------------------------
PART_01 NO TO_DATE(' 2016-05-01 00:00:00', ...
SYS_P17403 YES TO_DATE(' 2016-12-01 00:00:00', ...
SYS_P17404 YES TO_DATE(' 2017-01-01 00:00:00', ...

What's the earliest row in the table?

SQL> select count(1), min(LOG_TIME) from Log_Data_P;

COUNT(1) MIN(LOG_TIME)
---------- -----------------------------------
859906 01-NOV-16 12.00.02.397712 AM -05:00

Success!

Now we just need a way to calculate the name of the partition based on, say, SYSDATE-180 (or whatever our aging specification is) and we're golden.

Converting a TIMESTAMP WITH TZ in UTC to Central

At ABACAB we have - very correctly - a TIMESTAMP WITH TIME ZONE column called ONLINE_ENROLLMENT_TIME that has been populated - via an ETL process - with values in UTC.

This is good, because it is a completely unambiguous value. Oh, if only all our date times were stored this way.

Unfortunately, the convention up until now has been:

  • assume date times are in Central time zone
  • convert to integer "date key" and "time key" data types (primary key for date and time dimensions).

This loses any time zone information, which is not great. It means that all client processes using this data also need to assume Central time zone.

The following SQL takes our time zone-aware column and transforms it:

select
ID,
ONLINE_ENROLLMENT_TIME,
(ONLINE_ENROLLMENT_TIME at time zone 'US/Central') as ONLINE_ENROLLMENT_TIME_C,
trunc( cast( (ONLINE_ENROLLMENT_TIME at time zone 'US/Central') as DATE )) as ONLINE_ENROLLMENT_DATE
from ZYXX.D_Person
where ONLINE_ENROLLMENT_TIME
is not null;

Results:

image

Testing using SQL and common table expressions

Sometimes I forget how to do simple things, like splitting a string into sections based on a delimiter.

Here's a nice way to test your code on a range of input values, and see the interim results, in one hit:

with 
myobject as (
   -- put in a range of test values:
   select 'dbo.MyFunction' as NAME
   union
   select 'setup.ins_Contact'
   union
   select 'MyOtherFunction'
),
more2 as (
   -- calculate interesting attributes:
   select
      NAME,
      len( NAME )            as LENGTH_,
      charindex( '.', NAME ) as POSITION
   from myobject
)
-- perform the test:
select 
   NAME, POSITION, LENGTH_,
   case when POSITION=0 then 'dbo'
                        else substring( NAME, 1, POSITION-1 ) 
                        end as SCHEMA_,
   substring( NAME, POSITION+1 , (LENGTH_ - POSITION ) ) as OBJECT_
from more2;
;

More pitfalls with NVARCHAR2 vs VARCHAR2

Consider the following POSTAL_CODE column:

create table Test1 (
POSTAL_CODE NVARCHAR2(10)
);

insert into Test1 values ('12345-1234');
commit;

select * from Test1;

POSTAL_CODE
--------------------
12345-123

That's right - the NVARCHAR2(10) column appears to lop of 10th character and doesn't store it.

This behavior must be related to multi-byte character, and Unicode, somehow. The Oracle DB docs on Globalization Support can get quite detailed, but I still haven't found a concise explanation of this silent truncation behavior.

So, just make the column larger, right?

We tried that, altering the column from 10 to 15 characters. Our column can now correctly store the full 10 characters of a postal code string.

But then, another problem showed up when we needed to match on the first 5 characters in the ZIP code.

Consider:

create table Test2 ( V1 varchar2(15), N1 nvarchar2(15) ) ;
insert into Test2 values ( '1234567', '1234567' );
insert into Test2 values ( '123456', '123456' );
insert into Test2 values ( '12345', '12345' );
insert into Test2 values ( '1234', '1234' );
insert into Test2 values ( '123', '123' );
commit;

SQL> select V1, substr( V1, 1,5 ), N1, substr( N1, 1, 5 ) from Test2;

V1 SUBSTR(V1,1,5) N1 SUBSTR(N1,1,5)
--------------- -------------- ------------------------------ --------------
1234567 12345 1234567 1234
123456 12345 123456 1234
12345 12345 12345 1234
1234 1234 1234 1234
123 123 123 123

Yup, that's right. SUBSTR( NVATCHAR2 ) does not return the full 5 characters!

As a workaround, we can use SUBSTRC():

SQL> select V1, substrc( V1, 1,5 ), N1, substrc( N1, 1, 5 ) from Test2;

V1 SUBSTRC(V1,1,5) N1 SUBSTRC(N1,1,5)
--------------- --------------- ------------------------------ ------------------------------
1234567 12345 1234567 12345
123456 12345 123456 12345
12345 12345 12345 12345
1234 1234 1234 1234
123 123 123 123

Now the values retrieved from the NVARCHAR2 column are working as expected.

Bottom Line:

  • Don't use NVARCHAR2 for size-limited columns, unless you absolutely have to
  • If you have to use NVARCHAR2, us SUBSTRC() for partials.

Adding Tags to posts in LiveWriter?

Reference:

I'm not sure these instructions are correct. Investigating further...

The correct RegEdit node appears to be

HKEY_CURRENT_USER \ Software \ Microsoft \ Windows Live \ Writer \ Weblogs \ {GUID} \ UserOptionOverrides

The next question is, does it actually work? It appears to. At least, it adds the Tags box to the UI:

image

Update: Well, it looks like it would work, but it doesn't seem to persist. The tag box is no longer visible in my instance of LiveWriter.

Enterprise Architect: Importing or Refreshing DB schemas

You will need:

  • a DB instance to read the schema from
  • an Oracle ODBC driver that EA can use
  • an ODBC configuration pointing to the DB source schema
  • user credentials for the DB source schema
  • an EA project to import into

Oracle DB instance

We could point at the Production system, but this is not a great idea, especially when you're testing. I use a local virtual machine (running Oracle Linux 6.6) with an instance of Oracle Database 12c. This is where I do all my test builds, pulled from the latest branch in source control.

image

Oracle ODBC Driver

I'm going to assume you've got this covered... otherwise, this is a good starting reference:

(We could try using the default Microsoft ODBC Driver for Oracle, but I've never got it to work.)

ODBC and Enterprise Architect

My local operating system is 64-bit Windows. However, my installation of Enterprise Architect is 32-bit. This means that it will use the 32-bot ODBC components, and I need to use the 32-bit ODBC Data Source Administrator to configure a data source (DSN).

And if Life weren't complicated enough:

But, the bottom line is:

On a 64bit machine when you run “ODBC Data Source Administrator” and created an ODBC DSN, actually you are creating an ODBC DSN which can be reachable by 64 bit applications only.

But what if you need to run your 32bit application on a 64 bit machine ? The answer is simple, you’ll need to run the 32bit version of “odbcad32.exe” by running “c:\Windows\SysWOW64\odbcad32.exe” from Start/Run menu and create your ODBC DSN with this tool.

Got that?

  • 64-bit ODBC Administrator:    c:\windows\System32\odbcad32.exe
  • 32-bit ODBC Administrator:    c:\windows\SysWOW64\odbcad32.exe

It's mind-boggling. The UI looks identical too.

Configuring a Data Source (DSN)

  • Run the 32-bit ODBC Administrator
  • Select User DSN
  • Click on "Add..."
  • Select the Oracle driver:

image

  • Click "Finish"

We're not finished. We need to enter some details:

image

The only critical parameter here is the TNS Service Name which needs to match whatever you've set up in the TNSNAMES.ORA config file, for your target DB instance.

Here, I've used a user name of SYSTEM because this is my test Oracle instance. Also, it will allow me to read from any schema hosted by the DB, which means I can use the DSN for any test schema I build on the instance.

Now that the DSN is created, we can move on to working in Enterprise Architect.

Case Study: Importing into a clean project

Note: I'm using images from a Company-Internal How-To guide that I authored. I feel the need to mask out some of the details. Alas I am not in a position to re-create the images from scratch. I debated omitting the images entirely but that might get confusing.

In Enterprise Architect, we have a clean, empty project, and we didn't use Wizards to create template objects. It is really just a simple folder hierarchy:

image

Right-click on the folder and select from the cascading drop-down menu of options:

  • Code Engineering  >  Import DB schema from ODBC

This will bring up the Import DB schema from ODBC source dialog.

Click on the chooser button on the right side of the "Database" field to bring up the ODBC Data Source chooser:

image

The DSN we created should be available under the Machine Data Source tab. Select it, and click OK. We should be prompted to enter a password for our pre-entered User Name, after which Enterprise Architect will show us all the schemas to which we have access on the DB:

image

Previously, on this instance, I ran the database build scripts and created a set of test schemas using the T_ name prefix.

We're going to import the contents of the T_IHD schema into our project, so we check the "T_IHD" schema name.

Our intention is to import (create) elements under the package folder, for each table in the T_IHD schema.

Review the filter options carefully!

The default settings are probably correct, if we are only interested in creating elements for each table. Note the Synchronization options:

image

The package folder is empty, so you might think that we need to change this to "Import as New objects". Don't worry. New objects will be created if they don't already exist.

Now click on "Import" button at the top right of the dialog. After a sort wait, during which EA is retrieving metadata from the database, the contents of the T_IHD schema will be presented to us in the "Select Database Objects to Import" dialog:

image

We are only interested in the tables, so check the [x] Tables checkbox and all the contained tables will be selected.

Are there objects we don't need to import?

Often there are objects we're not interested in including in the Data Model. For example, the TEMP_EVENT table might only be used as an interim location for data, during some business process, and not worth complicating the model with.

We can clear the checkbox next to the TEMP_EVENT table name to skip it.

Now, having validated our selection, we can press the OK button to start the import.

image

The process may take some time... when it completes, we can close the Import dialog, or select another schema and destination folder and import a different schema.

The table elements should now be visible in the Project hierarchy:

image

Case Study: Refreshing a schema

We need to refresh the RIX schema tables in the model. This schema is already in the model, with lots of additional element and attribute-specific notes that we need to retain.

Two important things to note:

  • The schema contents are organized differently (temp tables were imported into the model but then moved into a sub-folder);
  • There are some minor structure changes to columns, in the source database

image

We need to refresh the schema in the model, with the new structure from the DB. The process to follow is almost exactly the same as the clean import described earlier:

Select the package folder

  • Right Click > Code Engineering > Import the DB schema
  • select the T_RIX schema
  • Make sure the Synchronization options are "(o) Synchronize existing classes" and "[  ] Overwrite object comments". We do not want to remove any existing notes entered against elements and attributes in the model.

Now, in this example we are only interested in refreshing those tables that are already in the model, in the current folder. That represents a subset of the total set of tables that are in the current DB schema:

image

In the image above, LOG_CTL is a table we don't care about (not in the model); and the T_* tables are temporary tables that, in the model, are in a different folder. We'll do those next (see below).

After the import process is completed, we should be able to drill down and see the schema changes reflected in the updated model.

We can repeat the process for the Temp Tables sub-package, this time selecting only the T_* temporary tables in the import selection.

Summary:

  • Table objects in the model might be moved around into different folders, per type, for documentation purposes.
  • The schema import process wants to import all selected tables into the current folder
  • Therefore, if you have separated out tables in a single schema, you should refresh them on a folder-by-folder basis, selecting only the objects you want to be in each folder in the model.

With careful set-up, it is possible to import and refresh table structures from database schemas into a project in Enterprise Architect, without over-writing existing documentation and attribute notes.

Git: how to create a merge conflict all on your own

I had some doubts that I was seeing all the changes from all developers, so I did some testing.

(I ran a git log command after each step to see what changes would show up.)

Step 1: Add a new file, and commit locally:

Mon Jul 11 21:05:41 2016 - Colin Nicholls : (Testing) Added a file locally
A misc_scripts/Testing_git_log.txt

 

Step 2: Pull from remote to refresh locally

Step 3: Push my local changes to the remote (origin).

Step 4: Edit the file, and commit. I misunderstood the use of the [x] Amend Last commit checkbox...

Mon Jul 11 21:10:28 2016 - Colin Nicholls : (Testing) Editing a file locally
A misc_scripts/Testing_git_log.txt

Note that the time-stamp has changed, and the comment text, but it is the same "Add" operation.

Step 5: Edit the file again, and commit. (This time, I did not use the [  ] Amend last commit, and I commented thusly:

Mon Jul 11 21:10:28 2016 - Colin Nicholls : (Testing) Editing a file locally
A misc_scripts/Testing_git_log.txt

Mon Jul 11 21:12:43 2016 - Colin Nicholls : (testing) [ ] Amend last commit (did not check)
M misc_scripts/Testing_git_log.txt

Step 6: Do a push to remote repository:

git.exe push --progress "origin" master:master

To https://github.abacab.com/zyxx/zyxx-db.git
! [rejected] master -> master (non-fast-forward)
error: failed to push some refs to 'https://github.abacab.com/zyxx/zyxx-db.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
git did not exit cleanly (exit code 1) (1981 ms @ 7/11/2016 13:14:39)

OK, I don't understand why "the tip of my current branch is behind its remote counterpart", but it is telling me I should do another pull before pushing, so:

Step 7: Pull from the remote to update locally:

git.exe pull --progress --no-rebase -v "origin"

From https://github.abacab.com/zyxx/zyxx-db
= [up to date] master -> origin/master
= [up to date] cmp13_convert -> origin/cmp13_convert
= [up to date] dev_2015_02_A -> origin/dev_2015_02_A
= [up to date] dyson -> origin/dyson
= [up to date] edison -> origin/edison
= [up to date] fermi -> origin/fermi
= [up to date] gauss -> origin/gauss
= [up to date] grendel -> origin/grendel
= [up to date] hubble -> origin/hubble
= [up to date] prod_2015_04 -> origin/prod_2015_04
Auto-merging misc_scripts/Testing_git_log.txt
CONFLICT (add/add): Merge conflict in misc_scripts/Testing_git_log.txt
Automatic merge failed; fix conflicts and then commit the result.
git did not exit cleanly (exit code 1) (2231 ms @ 7/11/2016 13:15:42)

Now I'm in conflict with myself!?

 

Step 8: At this point, I resolved the conflict by selecting "mine" over "theirs", and did another local commit. This time I got a special "merge commit" dialog that didn't show any specific changed files, but clearly wanted to do something.

So what does the log say at this point?

Mon Jul 11 21:05:41 2016 - Colin Nicholls : (Testing) Added a file locally
A misc_scripts/Testing_git_log.txt

Mon Jul 11 21:10:28 2016 - Colin Nicholls : (Testing) Editing a file locally
A misc_scripts/Testing_git_log.txt

Mon Jul 11 21:12:43 2016 - Colin Nicholls : (testing) [ ] Amend last commit (did not check)
M misc_scripts/Testing_git_log.txt

Mon Jul 11 21:22:08 2016 - Colin Nicholls : Merge branch 'master' of https://github.abacab.com/zyxx/zyxx-db

Interesting:
  1. The first edit operation has now returned to the log. It looks as though we have two "Add" operations.
  2. We get a generic "merge branch" message as the most recent log entry.
  3. Also, the time-stamps aren't "local" time, at least, not my local time (it's 13:25 PDT currently)

Step 9: Pull; delete test file; commit; push

Mon Jul 11 21:05:41 2016 - Colin Nicholls : (Testing) Added a file locally
A misc_scripts/Testing_git_log.txt

Mon Jul 11 21:10:28 2016 - Colin Nicholls : (Testing) Editing a file locally
A misc_scripts/Testing_git_log.txt

Mon Jul 11 21:12:43 2016 - Colin Nicholls : (testing) [ ] Amend last commit (did not check)
M misc_scripts/Testing_git_log.txt

Mon Jul 11 21:22:08 2016 - Colin Nicholls : Merge branch 'master' of https://github.abacab.com/zyxx/zyxx-db
Mon Jul 11 21:35:21 2016 - Colin Nicholls : (testing) deleted file
D misc_scripts/Testing_git_log.txt

This needs further testing, perhaps, but I'm going to return to billable work at this point.

Git: Obtaining a useful log of recent check-ins

This is mostly for my own reference, so I don't lose it.

C:> set path=C:\Programs\Git\bin;%PATH%
C:> D:
D:> cd /source_control/ABACAB/github/zyxx_db
D:> git log --name-status -10 > last_ten_updates.txt

So long as the current path is under the right repository directory, the git log command seems to pick up the right information, without being told what repository to interrogate.

The output is useful, but the default formatting isn't ideal. I'm not so interested in the git-svn-id or commit id. There's a comprehensive list of options...

Try:

git log --name-status --pretty=format:"%cd - %cn : %s" --date=iso

The results are close to what I'm used to with Subversion:

2016-06-25 21:54:13 -0700 - Colin Nicholls : Synchronizing with latest SVN version
M zyxx/db/trunk/DW/DIRECT_DW.pkb
M zyxx/db/trunk/DW/SAMPLE_DATA.pck
:
M zyxx/db/trunk/environments/PROD/db_build_UAT.config
M zyxx/db/trunk/environments/UAT/db_build.config

2016-06-02 17:41:59 +0000 - cnicholls : Re-run previous report asynch; clear STATUS_TEXT on re-run
M zyxx/db/trunk/RIX/RIX.pck

2016-06-01 23:05:29 +0000 - cnicholls : Added V_Rix_Run_Log
M zyxx/db/trunk/RIX/create_views.sql

2016-06-01 09:12:36 +0000 - fred : Prepare deployment script.
M zyxx/db/trunk/deployment_scripts/49/during_DW.sql

2016-06-01 02:27:39 +0000 - zeng : INH-1139: Offer Issue - DW Should handle the "Link" action for Tag. Fix bug.
M zyxx/db/trunk/DW/ABACAB_RI.pkb

What I don't yet know is why the most recent change has a time zone of "-0700" and the others "-0000". It may have something to do with the way the previous entries were imported. Notice the committer name is different in the most recent check-in, which was the first one I did from my working copy, after the initial import.

Update

My current format of choice:

git log --name-status --pretty=format:"%cd - %cn : %s" --reverse --date-order --date=local

However, "local" doesn't seem to mean "my local time zone". So, not sure what the best date format is.