The mechanism depends on whether the index to be created is Unique or Non Unique. Lets discuss the Unique Secondary Index (USI) first. The difference between primary and secondary index is that primary index determines the way in which the records are arranged and distributed (by way of the corresponding hash map values), and is therefore part of the table structure. A secondary index is created, stored and arranged outside the main table structure so that the actual data distrubution is not impacted. But the actual index is still hashed and arranged by the hash map. Lets illustrate by an example.
Column Value x (assuming a single column, on which the Primary index is built) | Hex value of hashrow(x) | Hex value of the first 20 bits of hashrow(x) | Amp assigned for this bucket of hash value | Column Value y (assuming a single column on which the secondary index is to be built) | Hex value of the first 20 bits of hashrow(y) | Hex value of the first 20 bits of hashrow(y) | Amp assigned for this bucket of hash value |
1 | 0x79b69e37 | 0x79b69 | 0 | 22377656 | 0x43920a75 | 0x43920 | 1 |
2 | 0xf3713c6e | 0xf3713 | 3 | 56748734 | 0x79b76954 | 0x79b76 | 3 |
134 | 0xb6ded109 | 0xb6ded | 1 | 12346532 | 0xadgt8734 | 0xadgt8 | 0 |
999999 | 0x06bf5ee1 | 0x06bf5 | 2 | 23783113 | 0x05738291 | 0x05738 | 2 |
As is evident from this table, the destination AMPs assigned to the secondary index values can be different from the AMPs in which the actual records are stored. Thus, accessing a table using Unique Secondary index will involve 2 AMPs - the 1st fetch will be to find the index by hashing the predicate, which in turn will give the hash value of the actual record.
This hash value will then be used for the second fetch to retrieve the original record. Of course Teradata might still decide to do a Full Table Scan (FTS) if it figures out this is less costly an operation that the cumulative 2-AMP fetch for all the values in the predicate. This is beyond the control of the operator unlike in case or Oracle where one can override decisions of the cost based optimizer using hints. This is good in a way as long as all the underlying tables are routinely analyzed.
The Non-Unique Secondary Index (NUSI) can be organized either by value or hash value. The value ordered NUSI is more widely used where there is a need to scan a range of the data. (store_location_cd between 56 and 67).
For each AMP build an index table (or subtable as refered in many other places) to locally index the column(s) values in that AMP. Note the multiple row Ids stored because of non-uniqueness.
Amp 1 | Amp 2 | ||||
Column Value x (assuming single NUSI column) | Row id of index | Row Id of table | Column Value x (assuming single NUSI column) | Row id of index | Row Id of table |
Apple | 9001 | 1001, 1009, 1011 | Apple | 7001 | 2001 |
Beet | 9002 | 1004 | Banana | 7002 | 2003, 2007 |
Cherry | 9003 | 1002, 1030 | Chardonnay | 7003 | 2006 |
Donut | 9004 | 1008 | Éclairs | 7004 | 2004, 2010 |
No comments:
Post a Comment