Areas of Interest, as counted by my cat

Author: colin (Page 4 of 13)

Finding the Data Dictionary in DB2

I’m not creating a special “DB2” category for this post 🙂

I’ve been looking for metadata, or some kind of data dictionary similar to those available in MS SQL Server or Oracle DB, to describe the tables and columns in a schema in DB2.

The closest match so far seems to be information in the SYSIBM library:

Specifically:

  • SQLCOLUMNS
  • SQLTABLES

Reference:

Selecting a timestamp AT TIME ZONE – a gotcha

So there’s this Oracle database instance in the midwest, in US/Central time zone.

So if I use PuTTY to open a terminal window on the host and open a SQL*Plus session, if I execute this statement I’ll get the following results:

SQL> select
max( CHANGE_TS ) as MAX_CHANGE,
max( CHANGE_TS at time zone 'US/Central' ) as MAX_CST_CHANGE
from tRetired_Mst;

MAX_CHANGE MAX_CST_CHANGE
-------------------------------- ---------------------------------------------
05-JAN-17 02.33.17.613000 PM 05-JAN-17 02.33.17.613000 PM US/CENTRAL

As expected, they are the same, because the local time zone of the session (terminal window) is the same as the DB, US/Central.

However:

If I run PL/SQL Developer on my Windows Laptop here in California, and connect to the same Oracle instance, then the results are different:

SQL> select
max( CHANGE_TS ) as MAX_CHANGE,
max( CHANGE_TS at time zone 'US/Central' ) as MAX_CST_CHANGE
from tRetired_Mst;

MAX_CHANGE MAX_CST_CHANGE
----------------------------- ----------------------------------------------
2017-01-05 14:33:17 05-JAN-17 04.33.17.613000 PM US/CENTRAL

Observe thatthe MAX_CST_CHANGE value is off by two hours! Because the timestamp value was evaluated as if it were in the local session time of my laptop, which is US/Pacific.

For code executing in PL/SQL packages in the database, this is probably not an issue… but if you are using this technique to convert values in ad-hoc SQL, you should be aware of this trap.

Tagging with Git

This is newbie-level stuff, but it is new to me…

When I used TortoiseGit to tag my latest changes, I got a strange error “Directory C:\Documents does not exist.”

image

Rather than spend time tracking this down, I decided to bite the bullet and figure out how to use the command line. It’s not hard (of course), you just have to know and remember.

Reference:

Here’s what I did:

Change to the project directory:

> cd \source_control\ABACAB\github\zyxx\zyxx-db\master

Review the current tags (just for fun):

D:\..\zyxx-db\master> git tag
20150614_clean_1
dev_2015_02_A_1
zyxx-1_0_0
zyxx-1_0_1
zyxx-1_0_2
zyxx-41_1-prod
zyxx-prod-42
:
zyxx-prod-50
zyxx-prod-54
zyxx-uat-20150423

D:\..\zyxx-db\master>

Ah, consistency: the hobgoblin of little minds. Or something like that.

Now create a new annotated tag:

D:\..\zyxx-db\master> git tag -a zyxx-prod-55 -m "For Jan 5 2017 deployment to PROD"
D:\..\zyxx-db\master>

We have to explicitly push the tag to the remote server:

D:\..\zyxx-db\master> git push origin zyxx-prod-55
Counting objects: 1, done.
Writing objects: 100% (1/1), 183 bytes | 0 bytes/s, done.
Total 1 (delta 0), reused 0 (delta 0)
To https://github.abacab.com/zyxx/zyxx-db.git
* [new tag] zyxx-prod-55 -> zyxx-prod-55

D:\..\zyxx-db\master>

…and I think that’s it.

Setting up a LAMP test environment

Not my usual kind of thing, but a possible new client meant preparing a system to stand up a replica of their production environment. These are my notes.

I started by downloading an .iso of ubuntu-12.04-server and deploying it to a new VM.

Critical things to remember:

  • Set up two network adapters in the virtual machine config: NAT, and Host-only networking, as per my notes documented elsewhere;
  • select [X] LAMP server during the deployment to give it Apache/MySQL/PHP in one hit, at startup;
  • install ssh so you can remote into it
$ sudo apt-get install openssh-server openssh-client

Checking for installed status:

$ which php
/usr/bin/php

$ sudo service apache2 status
[sudo] password for colin: ****
Apache2 is running (pid 1107).

$ mysql -u root -p
Enter password: ****
..
mysql>

By default, the Apache web server document root folder appears to be:

$ ls -l /var/www
total 4
-rw-r--r-- 1 root root 177 Dec 6 12:10 index.html
$

Allowing connections to MySQL from outside the server

Edit /etc/mysql/my.conf:

Comment out the line:

#bind-address    = 127.0.0.1

Run the following SQL(might not be required, note the 0 rows affected):

mysql> grant all privileges on *.* to 'root'@'%' identified by '****' with grant option;
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

Restart the DB:

$ sudo service mysql stop
mysql stop/waiting

$ sudo service mysql start
mysql start/running, process 1578

And, importantly, test from outside (in my case, my host operating system):

C:\Program Files\MySQL\MySQL Server 5.5\bin> mysql -h 192.168.56.56 -u root -p
Enter password: ****
...
mysql>

In MySQL Workbench:

image

Cool.

Installing FTP server

Just in case we need to move files in and out of the test environment (which we almost certainly will need to do), we’ll set up an FTP server. The most suggested option seems to be something called vsftpd:

Reference: https://www.liquidweb.com/kb/how-to-install-and-configure-vsftpd-on-ubuntu-12-04-lts/

When I tried this, I got an error:

$ sudo apt-get install vsftpd
[sudo] password for colin: ***
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package vsftpd is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'vsftpd' has no installation candidate

Long story short, the following sequence of instructions worked for me:

1. un-comment the extras.ubuntu.com repositories in /etc/apt/sources.list:

## Uncomment the following two lines to add software from Ubuntu's
## 'extras' repository.
## This software is not part of Ubuntu, but is offered by third-party
## developers who want to ship their latest software.
deb http://extras.ubuntu.com/ubuntu precise main
deb-src http://extras.ubuntu.com/ubuntu precise main

2. Remove old information, and update:

$ sudo rm -rf /var/lib/apt/lists/*
$ sudo apt-get update

3. try installing again:

$ sudo apt-get install vsftpd
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
vsftpd
0 upgraded, 1 newly installed, 0 to remove and 129 not upgraded.
Need to get 124 kB of archives.
After this operation, 342 kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu/ precise/main vsftpd amd64 2.3.5-1ubuntu2 [124 kB]
Fetched 124 kB in 1s (91.2 kB/s)
Preconfiguring packages ...
Selecting previously unselected package vsftpd.
(Reading database ... 53560 files and directories currently installed.)
Unpacking vsftpd (from .../vsftpd_2.3.5-1ubuntu2_amd64.deb) ...
Processing triggers for man-db ...
Processing triggers for ureadahead ...
Setting up vsftpd (2.3.5-1ubuntu2) ...
vsftpd start/running, process 2229
$

Thank goodness for that!

Some configuration required in /etc/vsftpd.conf, see:

basically

write_enable=YES
local_umask=022

Restart:

$ sudo service vsftpd restart
vsftpd stop/waiting
vsftpd start/running, process 2462
$

Now, testing from the Windows host:

C:\PROGRA~2\WinSCP>WinSCP
winscp> open 192.168.56.56
Searching for host...
Connecting to host...
Authenticating...
Username: colin
Password: ****
Authenticated.
Starting the session...
Session started.
Active session: [1] 192.168.56.56
winscp>

OK, I think it is all ready for us to unpack that archive we got from the client and see about deploying into the test environment.

Dates and Time Zone conversion

It’s time to take another look at some date/time conversion techniques…

Consider reading a DATE value from some source system, in a known time zone, and needing to convert it into some other time zone before saving into the destination table:

with
src_date as (
select
-- ---------------------------------------------------------------------
-- Here's a date value from the source system. From the value itself
-- we can't tell anything significant about it:
-- ---------------------------------------------------------------------
to_date('2016-10-03 23:20','yyyy-mm-dd hh24:mi') as INPUT_DT
from dual
),
assume_data as (
select
INPUT_DT,
-- --------------------------------------------------------------------------
-- If we know the value is a US/Pacific time zone value, we can declare that:
-- --------------------------------------------------------------------------
from_tz( cast( INPUT_DT as TIMESTAMP) , 'US/Pacific' ) as PACIFIC_TSTZ,
-- --------------------------------------------------------------------------
-- Alternatively, what if we know it is in China Standard Time ?
-- --------------------------------------------------------------------------
from_tz( cast( INPUT_DT as TIMESTAMP) , '+8:00' ) as CHINA_TSTZ
from src_date
)
select
-- ----------------------------------------------------------------
-- Now we have values that declare what time zone they belong to,
-- we can easily transform the value into datetimes for a different time zone:
-- ----------------------------------------------------------------
INPUT_DT,
PACIFIC_TSTZ,
CHINA_TSTZ,
-- ----------------------------------------------------------------
-- convert the China Standard Time value into US/Central:
-- ----------------------------------------------------------------
CHINA_TSTZ at time zone 'US/Central' as CENTRAL_TSTZ ,
-- ----------------------------------------------------------------
-- convert the US/Pacific time into UTC:
-- ----------------------------------------------------------------
PACIFIC_TSTZ at time zone 'UTC' as UTC_TSTZ,
PACIFIC_TSTZ at time zone '+0:00' as UTC_TSTZ_ALT
from assume_data;

Results:

image

For example, when loading DATE values from C2R (Pacific time) and storing them in EDW (UTC) we could use the following expression:

cast( from_tz(cast( INPUT_DT as TIMESTAMP), 'US/Pacific') at time zone 'UTC' as DATE )

That is, if we really needed a DATE data type. I believe Oracle DB would handle the outer cast automatically if we attempted to save the TIMESTAMP+TZ value into a destination table’s DATE column

For more fun:

An exercise in Interval Partitioning

Consider a table, LOG_DATA, with lots of rows:

select trunc( LOG_TIME, 'MM' ) as MONTH, count(1) as Count
from Log_Data
group by trunc(LOG_TIME,'MM')
order by trunc(LOG_TIME,'MM');

image

Perhaps this would be better served by a table with RANGE partitioning, using the INTERVAL syntax of 11g to automatically create partitions as data accumulates?

Unfortunately, LOG_TIME is data type timestamp with time zone, and we can’t create an interval partition on timestamp columns (yet!)

Also, we can’t use an expression in the interval definition. So, we need a DATE column:

create table Log_Data_P
(
ID integer,
PARENT_ID integer,
LOG_TIME timestamp with time zone default systimestamp,
STACK_LEVEL integer,
PACKAGE_NAME varchar2(30),
PROCEDURE_NAME varchar2(30),
LOG_LABEL varchar2(100),
LOG_VALUE varchar2(1000),
LOG_DURATION number,
PROC_DURATION number,
STEP_DURATION number,
LOG_CLOSED integer,LOG_DATE date default sysdate
)
partition by range ( LOG_DATE )
interval ( numtoyminterval( 1, 'MONTH' ) )
(
partition part_01 values less than ( date '2016-05-01' )
);

Here, we have added a LOG_DATE column to the end of the table structure. I’ve given it a default value and checked the code that writes to the table, to ensure there are no conflicts (there aren’t).

Now let’s insert into the new table, with the rows from the original (spanning several months):

insert into Log_Data_P
select
ID, PARENT_ID, LOG_TIME, STACK_LEVEL, PACKAGE_NAME, PROCEDURE_NAME, LOG_LABEL, LOG_VALUE,
LOG_DURATION, PROC_DURATION, STEP_DURATION, LOG_CLOSED, cast( LOG_TIME as DATE )
from Log_Data;
commit;

No errors… let’s see what partitions we have, at the end of this process:

select PARTITION_NAME, INTERVAL, HIGH_VALUE
from DBA_Tab_Partitions
where TABLE_NAME = 'LOG_DATA_P'
order by PARTITION_POSITION;

PARTITION_NAME INTERVAL HIGH_VALUE
------------------ ---------- -----------------------------------------------------------
PART_01 NO TO_DATE(' 2016-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15277 YES TO_DATE(' 2016-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15278 YES TO_DATE(' 2016-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15280 YES TO_DATE(' 2016-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15281 YES TO_DATE(' 2016-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15279 YES TO_DATE(' 2016-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )
SYS_P15282 YES TO_DATE(' 2016-11-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' )

7 rows selected.

Interesting… we have 6 new partitions over and above the original “PART_01” partition that we created. Obviously, Oracle has allocated some automatically-generated names to the partitions.

OK, so now the time has come to perform some pruning on our log table. Let’s delete the oldest partition:

SQL> alter table Log_Data_P drop partition PART_01;

ORA-14758: Last partition in the range section cannot be dropped

Uh-oh. It’s not as simple as we’d hoped. More about dropping partitions in this excellent post here:

Follow up #1 – OK, so try something sneaky, and drop the second-to-last partition

I had an idea. What if we define the table with an artificially small initial partition, with a start date that is earlier than any data in the table?

If we take a look at our Log_Data table today:

SQL> select count(1), min(LOG_TIME) from Log_Data;

COUNT(1) MIN(LOG_TIME)
---------- -----------------------------------
1172660 12-OCT-16 10.57.17.516528 PM -05:00

1.1 million rows, since October 12.

Now let’s define the Log_Data_P table again, with a start date of 2016-05-01 as described above, and insert the rows from our Log_Data table.

What do the partitions look like now?

select PARTITION_NAME, INTERVAL, HIGH_VALUE
from DBA_Tab_Partitions
where TABLE_NAME = 'LOG_DATA_P'
order by PARTITION_POSITION;

PARTITION_NAME INTERVAL HIGH_VALUE
---------------- -------- -------------------------------------
PART_01 NO TO_DATE(' 2016-05-01 00:00:00', ...
SYS_P17402 YES TO_DATE(' 2016-11-01 00:00:00', ...
SYS_P17403 YES TO_DATE(' 2016-12-01 00:00:00', ...
SYS_P17404 YES TO_DATE(' 2017-01-01 00:00:00', ...

“PART_01” won’t have any rows in it, but it defines the interval for future partitions. Now imagine some time goes by, and we don’t want the oldest month of data… so, let’s drop October, ‘SYS_P17402’:

SQL> alter table Log_Data_P drop partition SYS_P17402;

Table altered

select PARTITION_NAME, INTERVAL, HIGH_VALUE
from DBA_Tab_Partitions
where TABLE_NAME = 'LOG_DATA_P'
order by PARTITION_POSITION;

PARTITION_NAME INTERVAL HIGH_VALUE
---------------- -------- ------------------------------------
PART_01 NO TO_DATE(' 2016-05-01 00:00:00', ...
SYS_P17403 YES TO_DATE(' 2016-12-01 00:00:00', ...
SYS_P17404 YES TO_DATE(' 2017-01-01 00:00:00', ...

What’s the earliest row in the table?

SQL> select count(1), min(LOG_TIME) from Log_Data_P;

COUNT(1) MIN(LOG_TIME)
---------- -----------------------------------
859906 01-NOV-16 12.00.02.397712 AM -05:00

Success!

Now we just need a way to calculate the name of the partition based on, say, SYSDATE-180 (or whatever our aging specification is) and we’re golden.

Converting a TIMESTAMP WITH TZ in UTC to Central

At ABACAB we have – very correctly – a TIMESTAMP WITH TIME ZONE column called ONLINE_ENROLLMENT_TIME that has been populated – via an ETL process – with values in UTC.

This is good, because it is a completely unambiguous value. Oh, if only all our date times were stored this way.

Unfortunately, the convention up until now has been:

  • assume date times are in Central time zone
  • convert to integer “date key” and “time key” data types (primary key for date and time dimensions).

This loses any time zone information, which is not great. It means that all client processes using this data also need to assume Central time zone.

The following SQL takes our time zone-aware column and transforms it:

select
ID,
ONLINE_ENROLLMENT_TIME,
(ONLINE_ENROLLMENT_TIME at time zone 'US/Central') as ONLINE_ENROLLMENT_TIME_C,
trunc( cast( (ONLINE_ENROLLMENT_TIME at time zone 'US/Central') as DATE )) as ONLINE_ENROLLMENT_DATE
from ZYXX.D_Person
where ONLINE_ENROLLMENT_TIME
is not null;

Results:

image

Testing using SQL and common table expressions

Sometimes I forget how to do simple things, like splitting a string into sections based on a delimiter.

Here’s a nice way to test your code on a range of input values, and see the interim results, in one hit:

with 
myobject as (
-- put in a range of test values:
select 'dbo.MyFunction' as NAME
union
select 'setup.ins_Contact'
union
select 'MyOtherFunction'
),
more2 as (
-- calculate interesting attributes:
select
NAME,
len( NAME ) as LENGTH_,
charindex( '.', NAME ) as POSITION
from myobject
)
-- perform the test:
select
NAME, POSITION, LENGTH_,
case when POSITION=0 then 'dbo'
else substring( NAME, 1, POSITION-1 )
end as SCHEMA_,
substring( NAME, POSITION+1 , (LENGTH_ - POSITION ) ) as OBJECT_
from more2;
;

More pitfalls with NVARCHAR2 vs VARCHAR2

Consider the following POSTAL_CODE column:

create table Test1 (
POSTAL_CODE NVARCHAR2(10)
);

insert into Test1 values ('12345-1234');
commit;

select * from Test1;

POSTAL_CODE
--------------------
12345-123

That’s right – the NVARCHAR2(10) column appears to lop of 10th character and doesn’t store it.

This behavior must be related to multi-byte character, and Unicode, somehow. The Oracle DB docs on Globalization Support can get quite detailed, but I still haven’t found a concise explanation of this silent truncation behavior.

So, just make the column larger, right?

We tried that, altering the column from 10 to 15 characters. Our column can now correctly store the full 10 characters of a postal code string.

But then, another problem showed up when we needed to match on the first 5 characters in the ZIP code.

Consider:

create table Test2 ( V1 varchar2(15), N1 nvarchar2(15) ) ;
insert into Test2 values ( '1234567', '1234567' );
insert into Test2 values ( '123456', '123456' );
insert into Test2 values ( '12345', '12345' );
insert into Test2 values ( '1234', '1234' );
insert into Test2 values ( '123', '123' );
commit;

SQL> select V1, substr( V1, 1,5 ), N1, substr( N1, 1, 5 ) from Test2;

V1 SUBSTR(V1,1,5) N1 SUBSTR(N1,1,5)
--------------- -------------- ------------------------------ --------------
1234567 12345 1234567 1234
123456 12345 123456 1234
12345 12345 12345 1234
1234 1234 1234 1234
123 123 123 123

Yup, that’s right. SUBSTR( NVATCHAR2 ) does not return the full 5 characters!

As a workaround, we can use SUBSTRC():

SQL> select V1, substrc( V1, 1,5 ), N1, substrc( N1, 1, 5 ) from Test2;

V1 SUBSTRC(V1,1,5) N1 SUBSTRC(N1,1,5)
--------------- --------------- ------------------------------ ------------------------------
1234567 12345 1234567 12345
123456 12345 123456 12345
12345 12345 12345 12345
1234 1234 1234 1234
123 123 123 123

Now the values retrieved from the NVARCHAR2 column are working as expected.

Bottom Line:

  • Don’t use NVARCHAR2 for size-limited columns, unless you absolutely have to
  • If you have to use NVARCHAR2, us SUBSTRC() for partials.
« Older posts Newer posts »

© 2024 More Than Four

Theme by Anders NorenUp ↑