Sunday, 26 December 2010

BTEQ (be-tech)


BTEQ is Batch (or Basic) TEradata Query utility. Non SQL commands in BTEQ begin with a dot (.)
I was not able to login to BTEQ for an entire evening. I got the syntax from Teradata User's Guide: The Ultimate Companion, Third Edition (ISBN 9780970498069):
.logon nodeName/userName
The hosts file in my computer looks like this:
127.0.0.1 localhost localtdcop1
::1 localhost

so the string I entered was
.logon localtdcop1/tduser

but it threw an invalid logon error.
Teradata BTEQ 13.00.00.03 for WIN32.
Copyright 1984-2009, Teradata Corporation. ALL RIGHTS RESERVED.
Enter your logon or BTEQ command:
.logon localtdcop1/tduser
.logon localtdcop1/tduser
Password:
*** CLI error: CLI2: BADLOGON(303): Invalid logon string.
*** Return code from CLI is: 303
*** Error: Logon failed!
*** Total elapsed time was 15 seconds.
Teradata BTEQ 13.00.00.03 for WIN32. Enter your logon or BTEQ command:

What makes it worse is if you put any arbitary string in the nodeName it throws a different error, which leads you to think you are doing this part right.
Teradata BTEQ 13.00.00.03 for WIN32. Enter your logon or BTEQ command:
.logon arbStr/tduser
.logon arbStr/tduser
Password:
*** CLI error: MTDP: EM_NOHOST(224): name not in HOSTS file or names database.
*** Return code from CLI is: 224
*** Error: Logon failed!
*** Total elapsed time was 15 seconds.

The localtdcop1 has been created by Teradata and this is the string accepted by other Teradata ODBC Clients like SQL Assistant etc. While logging in to BTEQ, the cop1 part needs to be truncated so the login string looks like this:

.logon localtd/tduser
.logon localtd/tduser
Password:
*** Logon successfully completed.
*** Teradata Database Release is 13.00.00.12
*** Teradata Database Version is 13.00.00.12
*** Transaction Semantics are BTET.
*** Character Set Name is 'ASCII'.
*** Total elapsed time was 1 second.

Also, the node Name needs to be 8 characters or less or the system will throw the 'Invalid Logon String' error anyway. There are two modes of transactions Teradata supports - BTET (see the bold text above) and ANSI. BTEQ by defaults creates a BTET session but you can create an ANSI session by entering the command before the logon command.
Teradata BTEQ 13.00.00.03 for WIN32. Enter your logon or BTEQ command:
.set session transaction ansi
.set session transaction ansi
Teradata BTEQ 13.00.00.03 for WIN32. Enter your logon or BTEQ command:
.logon localtd/tduser
.logon localtd/tduser
Password:
*** Logon successfully completed.
*** Teradata Database Release is 13.00.00.12
*** Teradata Database Version is 13.00.00.12
*** Transaction Semantics are ANSI.
*** Character Set Name is 'ASCII'.
*** Total elapsed time was 1 second.

If you issue this command after logging in, you get this error:
.set session transaction ansi
*** Error: You must not be logged on to change the SQLFLAG or
TRANSACTION settings.

BTET stands for Begin Transaction End Transaction. Any command or set of command entered is automatically commited or rolled back in whole. Let’s see some examples.
/* Create a table first*/
CREATE TABLE Order_Table (
Order_Number INTEGER NOT NULL,
Customer_Number INTEGER NOT NULL,
order_date DATE,
order_total DECIMAL(10,2)
);
select * from order_table;
*** Query completed. 4 rows found. 4 columns returned.
*** Total elapsed time was 1 second.
Order_Number
Customer_Number
order_date
order_total
------------
---------------
----------
------------
1005
23005
08/08/22
2300.00
1002
23002
10/07/08
1000.00
1003
23003
09/09/21
2000.00
1001
23001
10/07/07
1000.00
The command below has a different outcome on the database depending on whether the session is ANSI or BTET.
insert into order_table(1004,23004,'2008-08-22', 2300);

In ANSI this needs to explicitly committed with the commit statement, while in BTET it is automatically commited. I noticed a peculiar effect of using an ANSI session. Any uncommited DML statement tends to hold up all other sessions on the table object. Here are the details.
Open an ANSI session and any other session (both on BTEQ). Issue the DML command in the ANSI session like the one I have shown above .
Teradata BTEQ 13.00.00.03 for WIN32. Enter your logon or BTEQ command
.logon localtd/tduser
.logon localtd/tduser
Password:
*** Logon successfully completed.
*** Teradata Database Release is 13.00.00.12
*** Teradata Database Version is 13.00.00.12
*** Transaction Semantics are ANSI.
*** Character Set Name is 'ASCII'.
*** Total elapsed time was 1 second.
BTEQ -- Enter your DBC/SQL request or BTEQ command:
insert into order_table(1004,23004,'2008-08-22', 2300);
insert into order_table(1004,23004,'2008-08-22', 2300);
*** Insert completed. One row added.
*** Total elapsed time was 1 second.
BTEQ -- Enter your DBC/SQL request or BTEQ command:
select * from order_table;
select * from order_table;
*** Query completed. 5 rows found. 4 columns returned.
*** Total elapsed time was 1 second.
Order_Number
Customer_Number
order_date
order_total
------------
---------------
----------
------------
1005
23005
08/08/22
2300.00
1004
23004
08/08/22
2300.00
1002
23002
10/07/08
1000.00
1003
23003
09/09/21
2000.00
1001
23001
10/07/07
1000.00

Now issue a simple select statement on this table from any another session. Notice that this commands goes into wait and there is no output. Itwill remain in this state till a rollback or commit issued in the first session.
BTEQ -- Enter your DBC/SQL request or BTEQ command:
commit;
commit;
*** COMMIT done.
*** Total elapsed time was 1 second.
The output from the other session(s) can now be seen.
BTEQ -- Enter your DBC/SQL request or BTEQ command:
select * from order_table;
select * from order_table;
*** Query completed. 5 rows found. 4 columns returned.
*** Total elapsed time was 4 minutes and 24 seconds.
Order_Number
Customer_Number
order_date
order_total
------------
---------------
----------
------------
1005
23005
08/08/22
2300.00
1004
23004
08/08/22
2300.00
1002
23002
10/07/08
1000.00
1003
23003
09/09/21
2000.00
1001
23001
10/07/07
1000.00

For Oracle users this might come as a big surprise, Oracle maintains read consistency as well but does not lock the transaction. This means if an update is not commited, other select (read) statements will continue to see the integrated data (as if no transaction has yet taken place). Oracle Rocks!
1st Oracle session:
SQL> select systimestamp from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
26-JUN-11 02.52.57.916000 PM -05:00

SQL> select *
  2  from emp
  3  order by emp_id;

    EMP_ID EMP_NAME                          DEPT_NO
---------- ------------------------------ ----------
       100 Adam                                   10
       200 Bud                                    20
       300 Carling                                10
       400 Don
       500 Emma                                   40

SQL> update emp
  2  set emp_name = 'Dana'
  3  where emp_name = 'Don';

1 row updated.

SQL> select systimestamp from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
26-JUN-11 02.53.56.372000 PM -05:00

SQL> select *
  2  from emp
  3  order by emp_id;

    EMP_ID EMP_NAME                          DEPT_NO
---------- ------------------------------ ----------
       100 Adam                                   10
       200 Bud                                    20
       300 Carling                                10
       400 Dana
       500 Emma                                   40

The first session sees the changed but uncommitted data at 2:53.
Another Oracle Session opened at 2:55:
SQL> select systimestamp from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
26-JUN-11 02.55.17.048000 PM -05:00

SQL> select *
  2  from emp
  3  order by emp_id;

    EMP_ID EMP_NAME    DEPT_NO
---------- -------- ----------
       100 Adam             10
       200 Bud              20
       300 Carling          10
       400 Don
       500 Emma             40

Another session does not see the changed uncommited data, because of read consistency. But unlike Teradata Oralce does not hold back this session until the first one commits or rolls back.
BTEQ also allows for taking batch input from a file on disk and displaying the output on a file, if need be. It is akin to the UNIX format of accepting input from a pipe and storing the output on another pipe.
C:\BTEQ < c:\temp\BTEQ_run_script.txt >c:\BTEQ_results.txt
The contents of the input file looks the one below. Note that the password is not specified, it is entered interactively at runtime.
.set session transaction ansi
.logon localtd/tduser
insert into order_table(1004,23004,'2008-08-22', 2300);
rollback;
insert into order_table(1005,23005,'2008-08-22', 2300);
select * from order_table;
commit;
.quit;
Export Output of a SQL onto a flat file
BTEQ -- Enter your DBC/SQL request or BTEQ command:
.EXPORT REPORT FILE c:\temp\order.txt
.EXPORT REPORT FILE c:\temp\order.txt
*** To reset export, type .EXPORT RESET
BTEQ -- Enter your DBC/SQL request or BTEQ command:
select * from order_table;
select * from order_table;
*** Query completed. 5 rows found. 4 columns returned.
*** Total elapsed time was 1 second.
BTEQ -- Enter your DBC/SQL request or BTEQ command:
.EXPORT RESET
.EXPORT RESET
*** Output returned to console.
Contents of the c:\temp\order.txt
Order_Number Customer_Number order_date order_total
------------ --------------- ---------- ------------
1005 23005 08/08/22 2300.00
1004 23004 08/08/22 2300.00
1002 23002 10/07/08 1000.00
1003 23003 09/09/21 2000.00
1001 23001 10/07/07 1000.00
Note the string REPORT after the EXPORT command, the other options being DATA and INDICDATA. These options are for storing binray data into a flat file which helps in faster loading and exporting.
Each call to the EXPORT command appends information to the file (if it already exists).

Note that I have formatted the some text output into tables for better readibility.

Wednesday, 22 December 2010

Data Distribution Logic on Teradata

Oracle lets the DBA decide on the data distribution. Teradata does this work by itself after the "distribution key" has been decided. Okay, there is no such term as a "distribution key" but that is what it practically is. Here's why.

The data distribution is done on the basis of primary index. What one can decide, as the DBA, is which column or set of columns will contribute to this primary index. More the possible number of values, more are the possible values of the hash function and better is the distribution of the data. In essence if employee Id (which might itself be arbitrary or a linearly increasing long number) is used as the primary index, then chances of a uniform data distribution are high. This is, of course, with the assumption that the hash function is efficient enough. Here is an example: 

Column Value x (assuming a single column)
Hex value of hashrow(x)
Hex value of the first 20 bits of hashrow(x)
1
0x79b69e37
0x79b69
2
0xf3713c6e
0xf3713
134
0xb6ded109
0xb6ded
999999
0x06bf5ee1
0x06bf5

Note that the hashrow() function is the same in all instances of Teradata, so "select hashrow(134)" would always give "0xb6ded109" in all Teradata implementations. So what is the third column here? The first 20 bits (or in some systems the first 16 bits) is the bucket for all the remaining 12 bits (or 16 bits), i.e. 0x79b69 is the bucket for columns whose hash function output is in the range 0x79b69000 to 0x79b69fff. Note that the output of the two functions are dependent only on the algorithm of the hashrow() function which is fixed across all Teradata systems.

So this table is true for all Teradata systems. The AMP assigned for this bucket will house all rows of the database for which the first 20 (or 16) bits match the bucket value, or all rows whose hashbucket(hashrow(row_distribution_key)) is 0x79b69. This AMP will be assigned to several other buckets. How are the AMPs assigned to buckets?

For each system there exists a hashmapbucket which maps AMPs against buckets (or sets of rows). The bucket to amp mapping might be different for different systems having different AMPs but will be same for all systems having the same number of AMPs. Some sources depict this mapping as a matrix. It does not matter so far this discussion is concerned. The only thing that interests us is for a given row (distribution key), there exists a unique hashrow(), a unique hashbucket(hashrow()) and a unique hashamp(hashbucket(hashrow())! Teradata guarantees this.

Bucketvalue
AMP
0x06bf5
0
0x79b69
3
0xb6ded
1
0xf3713
2

Simple? Read again (aloud).