Sunday 15 August 2010

Partitioning, and its effects on people, err... queries


Create table with Range partition, Unique primary index not containing Partition Key
CREATE TABLE Order_Table (
Order_Number INTEGER NOT NULL,
Customer_Number INTEGER NOT NULL,
order_date DATE,
order_total DECIMAL(10,2)
)
/* Partition Key not part of UPI */
UNIQUE PRIMARY INDEX (Order_number)
PARTITION BY RANGE_N
(
Order_date BETWEEN DATE '2009-04-01' AND DATE '2010-03-31'
EACH INTERVAL '1' MONTH
);
 [Teradata Database] [TeraJDBC 13.00.00.20] [Error 5712] [SQLState HY000] UNIQUE is not allowed.

Create table with Range partition, Non-unique PI and not containing Partition Key
DROP TABLE Order_Table;

CREATE TABLE Order_Table (
Order_Number INTEGER NOT NULL,
Customer_Number INTEGER NOT NULL,
order_date DATE,
order_total DECIMAL(10,2)
)
/* Partition Key not part of UPI */
/*UNIQUE*/ PRIMARY INDEX (Order_number)
PARTITION BY RANGE_N
(
Order_date BETWEEN DATE '2009-04-01' AND DATE '2010-03-31'
EACH INTERVAL '1' MONTH
);

SHOW TABLE order_table;
CREATE MULTISET TABLE TDUSER.order_table ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
Order_Number INTEGER NOT NULL,
Customer_Number INTEGER NOT NULL,
order_date DATE FORMAT 'YY/MM/DD',
order_total DECIMAL(10,2))
PRIMARY INDEX ( Order_Number ) /* Non Unique Primary Index Created */
PARTITION BY RANGE_N(Order_date BETWEEN DATE '2009-04-01' AND DATE '2010-03-31' EACH INTERVAL '1'
MONTH );

This is still a PI, so any predicate with the Order_Number should hit the PI (i.e. the right AMP). Let check this out. Once in the AMP, there is no logic to know which exact partition the row(s) is(are) in so all partiions will have to be scanned.
EXPLAIN
SELECT customer_number
FROM Order_Table
WHERE order_number = 1103;

1) First, we do a single-AMP RETRIEVE step from all partitions of
TDUSER.Order_Table by way of the primary index
"TDUSER.Order_Table.Order_Number = 1103" with a residual condition
of ("TDUSER.Order_Table.Order_Number = 1103") into Spool 1
(one-amp), which is built locally on that AMP. The size of Spool
1 is estimated with low confidence to be 1 row (25 bytes). The
estimated time for this step is 0.02 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.02 seconds.

EXPLAIN
SELECT customer_number
FROM Order_Table
WHERE order_number = 1103
AND order_date = DATE '2009-07-08';
/*Addition of the partition elimination predicate helps in singling out the partition */

1) First, we do a single-AMP RETRIEVE step from a single partition of
TDUSER.Order_Table by way of the primary index
"TDUSER.Order_Table.Order_Number = 1103,
TDUSER.Order_Table.order_date = DATE '2009-07-08'" with a residual
condition of ("(TDUSER.Order_Table.order_date = DATE '2009-07-08')
AND (TDUSER.Order_Table.Order_Number = 1103)") into Spool 1
(one-amp), which is built locally on that AMP. The size of Spool
1 is estimated with low confidence to be 1 row (25 bytes). The
estimated time for this step is 0.02 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.02 seconds.

Create table with Range partition, Unique primary index containing Partition Key
DROP TABLE Order_Table;

CREATE TABLE Order_Table (
Order_Number INTEGER NOT NULL,
order_date DATE,
Customer_Number INTEGER NOT NULL,
order_total DECIMAL(10,2)
)
/* Partition Key part of UPI */
UNIQUE PRIMARY INDEX (Order_number,Order_date)
PARTITION BY RANGE_N
(
Order_date BETWEEN DATE '2009-04-01' AND DATE '2010-03-31'
EACH INTERVAL '1' MONTH
);

SHOW TABLE order_table;
CREATE MULTISET TABLE TDUSER.order_table ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
Order_Number INTEGER NOT NULL,
order_date DATE FORMAT 'YY/MM/DD',
Customer_Number INTEGER NOT NULL,
order_total DECIMAL(10,2))
UNIQUE PRIMARY INDEX ( Order_Number ,order_date )
PARTITION BY RANGE_N(Order_date BETWEEN DATE '2009-04-01' AND DATE '2010-03-31' EACH INTERVAL '1'
MONTH );

EXPLAIN
SELECT customer_number
FROM Order_Table
WHERE order_number = 1103; /* Predicate on 1st part of UPI, no workee for Teradata, Oracle users note! */

1) First, we lock a distinct TDUSER."pseudo table" for read on a
RowHash to prevent global deadlock for TDUSER.Order_Table.
2) Next, we lock TDUSER.Order_Table for read.
3) We do an all-AMPs RETRIEVE step from TDUSER.Order_Table by way of
an all-rows scan with a condition of (
"TDUSER.Order_Table.Order_Number = 1103") into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with no confidence to be 1 row (25 bytes).
The estimated time for this step is 0.03 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.03 seconds.

EXPLAIN
SELECT customer_number
FROM Order_Table
WHERE order_date = DATE '2009-07-08'; /* Predicate on 2nd part of UPI */

1) First, we lock a distinct TDUSER."pseudo table" for read on a
RowHash to prevent global deadlock for TDUSER.Order_Table.
2) Next, we lock TDUSER.Order_Table for read.
3) We do an all-AMPs RETRIEVE step from a single partition of
TDUSER.Order_Table with a condition of (
"TDUSER.Order_Table.order_date = DATE '2009-07-08'") with a
residual condition of ("TDUSER.Order_Table.order_date = DATE
'2009-07-08'") into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 1 row (25 bytes). The estimated time for this step is 0.03
seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.03 seconds.

EXPLAIN
SELECT customer_number
FROM Order_Table
WHERE order_number = 1103
AND order_date = DATE '2009-07-08';
/* Predicate on full UPI, at last Teradata is happy! */

1) First, we do a single-AMP RETRIEVE step from a single partition of
TDUSER.Order_Table by way of the unique primary index
"TDUSER.Order_Table.Order_Number = 1103,
TDUSER.Order_Table.order_date = DATE '2009-07-08'" with a residual
condition of ("(TDUSER.Order_Table.Order_Number = 1103) AND
(TDUSER.Order_Table.order_date = DATE '2009-07-08')"). The
estimated time for this step is 0.01 seconds.
-> The row is sent directly back to the user as the result of
statement 1. The total estimated time is 0.01 seconds.

Create a table with Range partition, UPI and explicit PK constraint
DROP TABLE Order_Table;

CREATE TABLE Order_Table (
Order_Number INTEGER NOT NULL,
order_date DATE,
Customer_Number INTEGER NOT NULL,
order_total DECIMAL(10,2),
/* Explicit constraint specified */
CONSTRAINT pk_con PRIMARY KEY (order_number) /*Note this will create a USI since UPI is mentioned seperately*/
)
/* Partition Key part of UPI */
UNIQUE PRIMARY INDEX (Order_number,Order_date)
PARTITION BY RANGE_N(Order_date BETWEEN DATE '2009-04-01' AND DATE '2010-03-31'
                                                                EACH INTERVAL '1' MONTH);

SHOW TABLE order_table;
CREATE MULTISET TABLE TDUSER.order_table ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
Order_Number INTEGER NOT NULL,
order_date DATE FORMAT 'YY/MM/DD',
Customer_Number INTEGER NOT NULL,
order_total DECIMAL(10,2))
UNIQUE PRIMARY INDEX ( Order_Number ,order_date )
PARTITION BY RANGE_N(Order_date BETWEEN DATE '2009-04-01' AND DATE '2010-03-31' EACH INTERVAL '1'
MONTH )
UNIQUE INDEX pk_con ( Order_Number );

EXPLAIN
SELECT customer_number
FROM Order_Table
WHERE order_number = 1103;/* Should hit the Unique Secondary Index (USI)*/

1) First, we do a two-AMP RETRIEVE step from TDUSER.Order_Table by
way of unique index # 4 "TDUSER.Order_Table.Order_Number = 1103"
with no residual conditions. The estimated time for this step is
0.01 seconds.
-> The row is sent directly back to the user as the result of
statement 1. The total estimated time is 0.01 seconds.

EXPLAIN
SELECT customer_number
FROM Order_Table
WHERE order_date = DATE '2009-07-08';
/* Part of UPI but Teradata doesn't take partially qualified PI. being partition key instigates single partition of all AMPs */

1) First, we lock a distinct TDUSER."pseudo table" for read on a
RowHash to prevent global deadlock for TDUSER.Order_Table.
2) Next, we lock TDUSER.Order_Table for read.
3) We do an all-AMPs RETRIEVE step from a single partition of
TDUSER.Order_Table with a condition of (
"TDUSER.Order_Table.order_date = DATE '2009-07-08'") with a
residual condition of ("TDUSER.Order_Table.order_date = DATE
'2009-07-08'") into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 1 row (25 bytes). The estimated time for this step is 0.03
seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.03 seconds.

Create constraint with same definition as PI
DROP TABLE Order_Table;

CREATE TABLE Order_Table (
Order_Number INTEGER NOT NULL,
Customer_Number INTEGER NOT NULL,
order_date DATE,
order_total DECIMAL(10,2),
CONSTRAINT some_pk PRIMARY KEY (order_number)
)
PRIMARY INDEX (Order_number);
/* not allowed since this will result in UPI and USI on the same column*/

[Teradata Database] [TeraJDBC 13.00.00.20] [Error 3519] [SQLState 42S11] Two indexes with the same columns.