Monday, 18 October 2010

Secondary Indexes


The mechanism depends on whether the index to be created is Unique or Non Unique. Lets discuss the Unique Secondary Index (USI) first. The difference between primary and secondary index is that primary index determines the way in which the records are arranged and distributed (by way of the corresponding hash map values), and is therefore part of the table structure. A secondary index is created, stored and arranged outside the main table structure so that the actual data distrubution is not impacted. But the actual index is still hashed and arranged by the hash map. Lets illustrate by an example.

Column Value x (assuming a single column, on which the Primary index is built)
Hex value of hashrow(x)
Hex value of the first 20 bits of hashrow(x)
Amp assigned for this bucket of hash value
Column Value y (assuming a single column on which the secondary index is to be built)
Hex value of the first 20 bits of hashrow(y)
Hex value of the first 20 bits of hashrow(y)
Amp assigned for this bucket of hash value
1
0x79b69e37
0x79b69
0
22377656
0x43920a75
0x43920
1
2
0xf3713c6e
0xf3713
3
56748734
0x79b76954
0x79b76
3
134
0xb6ded109
0xb6ded
1
12346532
0xadgt8734
0xadgt8
0
999999
0x06bf5ee1
0x06bf5
2
23783113
0x05738291
0x05738
2


As is evident from this table, the destination AMPs assigned to the secondary index values can be different from the AMPs in which the actual records are stored. Thus, accessing a table using Unique Secondary index will involve 2 AMPs - the 1st fetch will be to find the index by hashing the predicate, which in turn will give the hash value of the actual record.

This hash value will then be used for the second fetch to retrieve the original record. Of course Teradata might still decide to do a Full Table Scan (FTS) if it figures out this is less costly an operation that the cumulative 2-AMP fetch for all the values in the predicate. This is beyond the control of the operator unlike in case or Oracle where one can override decisions of the cost based optimizer using hints. This is good in a way as long as all the underlying tables are routinely analyzed. 


The Non-Unique Secondary Index (NUSI) can be organized either by value or hash value. The value ordered NUSI is more widely used where there is a need to scan a range of the data. (store_location_cd between 56 and 67).
For each AMP build an index table (or subtable as refered in many other places) to locally index the column(s) values in that AMP. Note the multiple row Ids stored because of non-uniqueness.


Amp 1
Amp 2
Column Value x (assuming single NUSI column)
Row id of index
Row Id of table
Column Value x (assuming single NUSI column)
Row id of index
Row Id of table
Apple
9001
1001, 1009, 1011
Apple
7001
2001
Beet
9002
1004
Banana
7002
2003, 2007
Cherry
9003
1002, 1030
Chardonnay
7003
2006
Donut
9004
1008
Éclairs
7004
2004, 2010

No comments: