Analytics Lab: October 2010

The mechanism depends on whether the index to be created is Unique or Non Unique. Lets discuss the Unique Secondary Index (USI) first. The difference between primary and secondary index is that primary index determines the way in which the records are arranged and distributed (by way of the corresponding hash map values), and is therefore part of the table structure. A secondary index is created, stored and arranged outside the main table structure so that the actual data distrubution is not impacted. But the actual index is still hashed and arranged by the hash map. Lets illustrate by an example.

Column Value x (assuming a single column, on which the Primary index is built)	Hex value of hashrow(x)	Hex value of the first 20 bits of hashrow(x)	Amp assigned for this bucket of hash value	Column Value y (assuming a single column on which the secondary index is to be built)	Hex value of the first 20 bits of hashrow(y)	Hex value of the first 20 bits of hashrow(y)	Amp assigned for this bucket of hash value
1	0x79b69e37	0x79b69	0	22377656	0x43920a75	0x43920	1
2	0xf3713c6e	0xf3713	3	56748734	0x79b76954	0x79b76	3
134	0xb6ded109	0xb6ded	1	12346532	0xadgt8734	0xadgt8	0
999999	0x06bf5ee1	0x06bf5	2	23783113	0x05738291	0x05738	2

As is evident from this table, the destination AMPs assigned to the secondary index values can be different from the AMPs in which the actual records are stored. Thus, accessing a table using Unique Secondary index will involve 2 AMPs - the 1st fetch will be to find the index by hashing the predicate, which in turn will give the hash value of the actual record.

This hash value will then be used for the second fetch to retrieve the original record. Of course Teradata might still decide to do a Full Table Scan (FTS) if it figures out this is less costly an operation that the cumulative 2-AMP fetch for all the values in the predicate. This is beyond the control of the operator unlike in case or Oracle where one can override decisions of the cost based optimizer using hints. This is good in a way as long as all the underlying tables are routinely analyzed.

The Non-Unique Secondary Index (NUSI) can be organized either by value or hash value. The value ordered NUSI is more widely used where there is a need to scan a range of the data. (store_location_cd between 56 and 67).

For each AMP build an index table (or subtable as refered in many other places) to locally index the column(s) values in that AMP. Note the multiple row Ids stored because of non-uniqueness.

Amp 1			Amp 2
Column Value x (assuming single NUSI column)	Row id of index	Row Id of table	Column Value x (assuming single NUSI column)	Row id of index	Row Id of table
Apple	9001	1001, 1009, 1011	Apple	7001	2001
Beet	9002	1004	Banana	7002	2003, 2007
Cherry	9003	1002, 1030	Chardonnay	7003	2006
Donut	9004	1008	Éclairs	7004	2004, 2010

Analytics Lab

Monday, 18 October 2010

Secondary Indexes

About Me

Blog Archive