Using Teradata SQL Analyzer

Learn about Teradata SQL Analyzer, a utility framework to discover processing flows in SQL scripts, find inter-table dependencies, and convert them to ANSI-SQL standards.

Ming Yuan

Mar. 06, 18 · Tutorial

Likes (2)

Comment

Save

8.8K Views

Teradata is a robust RDBMS product for enterprises to host their data at terabyte-scale. It was designed based on massively parallel processing (MPP) and shared-nothing architecture, which deals with the coordinated processing of large datasets using multiple processors. This enables a rapid rate of execution for intricate queries against large data warehouses.

Due to recent technological trends in big data and the cloud, an increasing number of companies have been rethinking how to manage their data warehouses. Many of them have moved toward cloud solutions to avoid vendor-lock and to reduce operational costs. However, during migration, a common challenge comes from rewriting existing SQL scripts written in Teradata’s proprietary version of ANSI-SQL. These scripts implemented complex business logic and typically took a long time to convert to ANSI-SQL.

Teradata SQL Analyzer is a utility framework to discover processing flows embedded in SQL scripts, find inter-table dependencies, and convert them to ANSI-SQL standards. To use the framework, the first step is to add this JAR file into your classpath and then import com.myuan.transsql.* and com.myuan.transsql.query.BaseCommand in your calling classes. The BaseCommand is the superclass of all classes representing SQL statements, including CreateCommand, SelectCommand, UpdateCommand, and DeleteCommand.

The entry point of the framework is a factory class (QueryParser), which has the getCommand(s: String) method, which takes a String and creates one command object. After getting the command object, you can invoke its parse(s: String) method so that the framework code is executed to build a tree structure and to populate the values of its nodes from the SQL statement.

val comm: BaseCommand = QueryParser.getCommand(“CRUD SQL statement”)
comm.parse(“CRUD SQL statement”)

Now, you can use the BaseCommand object to show the structure and table dependencies within the SQL statement. The BaseCommand class has three useful methods:

showDependency(): This method returns a String object which depicts the processing flows and key table names in the SQL.
showStats(): This method returns a String in this format:
- CRUD action|All tables|the number of JOINs| the number of MINUSs | the number of JOINs
toSQL(): This method returns a String that is a new SQL statement conforming to ANSI-SQL standards.

To illustrate the capability of the framework methods, one sample input is listed below:

CREATE multiset volatile TABLE table3, no logs, AS (sel id_num AS new_id, family_id, birth_dt, wake_tm, col1 FROM table1 WHERE rel_type (NOT cs) LIKE 'Matt%'
AND 
birth_dt >= date - 60 
AND 
( 
  friend1,friend2 
) 
IN (sel friend1, 
    friend2 FROM ${schema2}.table4 WHERE home_type='house' GROUP BY 1, 
    2) 
AND 
upper (trim (id_num))(casespecific) = lower (trim (id_num))(casespecific ) qualify rank() OVER (partition BY id_num ORDER BY birth_dt ASC , wake_tm ASC) = 1
UNION 
         sel id_num AS new_id, 
         family_id, 
         birth_dt, 
         wake_tm, 
         col1 
FROM     table10 
WHERE    NAME (NOT cs) 
AND      birth_dt >= date - 60AND upper (trim (id_num))(casespecific) = lower (trim (id_num))(casespecific ) qualify rank() OVER (partition BY id_num ORDER BY birth_dt ASC , wake_tm ASC) = 1) WITH data PRIMARY INDEX(new_id) ONCOMMIT preserve rows

The output of the showDependency() method is:

CREATE TABLE [TABLE3]SELECT 5 columns 
FROM   table1 
WHERESELECT 2 columns 
FROM   ${SCHEMA2}.table4 
UNION 
SELECT 5 columns 
FROM   table10

The showStats() method returns:

CREATE|TABLE3,TABLE1,${SCHEMA2}.TABLE4,TABLE10|0|0|1

The toSQL() method generates a new SQL query:

CREATE TABLE TABLE3 AS (
        SELECT * FROM ( 
            SELECT ID_NUM AS NEW_ID, FAMILY_ID, BIRTH_DT, WAKE_TM, COL1, RANK () OVER ( PARTITION BY ID_NUM ORDER BY BIRTH_DT ASC , WAKE_TM ASC )  AS QUALIFIER_ELEMENT_131
            FROM TABLE1 
            WHERE REL_TYPE ILIKE 'Matt%'  AND  BIRTH_DT >= DATEADD ( DAY, -60, CURRENT_DATE() )  AND  ( FRIEND1,FRIEND2 ) IN (
                    SELECT FRIEND1, FRIEND2
                    FROM ${SCHEMA2}.TABLE4 
                    WHERE HOME_TYPE='house'
                    GROUP BY 1,2
                ) AND  UPPER ( TRIM ( ID_NUM ) ) = LOWER ( TRIM ( ID_NUM ) ) )
        WHERE QUALIFIER_ELEMENT_131 = 1
        UNION 
        SELECT * FROM ( 
            SELECT ID_NUM AS NEW_ID, FAMILY_ID, BIRTH_DT, WAKE_TM, COL1, RANK () OVER ( PARTITION BY ID_NUM ORDER BY BIRTH_DT ASC , WAKE_TM ASC )  AS QUALIFIER_ELEMENT_1737
            FROM TABLE10 
            WHERE NAME ILIKE 'Jason' AND BIRTH_DT >= DATEADD ( DAY, -60, CURRENT_DATE() )  AND  UPPER ( TRIM ( ID_NUM ) ) = LOWER ( TRIM ( ID_NUM ) ) )
        WHERE QUALIFIER_ELEMENT_1737 = 1
);

In many cases, the first step of rebuilding data warehouses in the cloud is to migrate existing tables and the data as-is. Most companies using Teradata have accumulated thousands of tables in their TD schema. A practical approach is to migrate tables in waves. But it became extremely important and difficult to decide which tables are used more and have the most impact. The framework can help in this analysis by generating usage numbers. You can collect all of the executed SQL statements in a single day from the Teradata Query Log table, loop through and parse all of the SQL, and then add the results to the com.myuan.transsql.TableStats class. The code snippet looks like:

for (s < -SQLList) {
 if (s != null) {
  var c: BaseCommand = SQLParser.getCommand(s)
  c.parse(s)
  if (null != c.getStats())
   TableStats.addTo(c.getStats().getTables())
 }
}
TableStats.print()

The method prints out two perspectives: the number of times that each table is referenced in the whole list, and a sorted list based on the number of occurrences. The output from the sample SQL above looks like

${SCHEMA2}.table4 --> 1 
table1            --> 2 
table3            --> 1 
---------------------------------------------------------------------- 
2--> TABLE1 
1--> ${SCHEMA2}.TABLE4|TABLE3

The performance of executing the framework depends on the number of SQL statements and the input and the complexity of those statements. During testing, my input was 1.9 million SQL statements in a 500MB file. The test harnesses parsed all of the SQL statements and created four output files by calling three methods on BaseCommand and one method on TableStates. With an 8GB heap, the total execution was around 2.5 minutes.

sql

Opinions expressed by DZone contributors are their own.

Related

Trending

Using Teradata SQL Analyzer

Learn about Teradata SQL Analyzer, a utility framework to discover processing flows in SQL scripts, find inter-table dependencies, and convert them to ANSI-SQL standards.

Related

Partner Resources