Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Azure Data Lakes and U-SQL SELECT Transformation Rowsets

DZone's Guide to

Azure Data Lakes and U-SQL SELECT Transformation Rowsets

See how you can use multiple transformations together with U-SQL SELECT Transformation Rowets and how that opens up options for you with Azure data lakes.

· Database Zone ·
Free Resource

Discover Tarantool's unique features which include powerful stored procedures, SQL support, smart cache, and the speed of 1 million ACID transactions on a single CPU core!

In this article, we are going to discuss the basic U-SQL SELECT query transformation rowset technique. I hope it will be informative.

U-SQL SELECT Query Transformation Rowsets

In a previous article, we retrieved data from a “SearchLog.tsv” file to “SearchLog-scalar-variables.csv”. It was just a simple file-to-file movement of data.

Now we have to think about some extra operation on the rowset data before storing it in the final destination. For example, some filtration, grouping, aggregate function in numerical values, etc. 

When we perform such an operation with a rowset before moving it to the destination, that is called SELECT Transformation Rowsets.

Example

We are taking the same example that we used in the previous article to simply copy data from a file named “SearchLog.tsv” and store it into “SearchLog-scalar-variables.csv”.

What we are doing in the transformation part is just filtering the rowset by region. We are using a boolean expression in the WHERE clause of the SELECT statement.

@searchlog =
    EXTRACT UserId          int,
            Start           DateTime,
            Region          string,
            Query           string,
            Duration        int?,
            Urls            string,
            ClickedUrls     string
    FROM "/Samples/Data/SearchLog.tsv"
    USING Extractors.Tsv();
@rs1 =
    SELECT Start, Region, Duration
    FROM @searchlog
WHERE Region == "en-gb";
OUTPUT @rs1  
    TO "/output/SearchLog-transform-rowsets.csv"
    USING Outputters.Csv();


The U-SQL part of the transformation part is:

Image title

The WHERE clause contains a C# boolean expression. We can use the C# expression language to do our own expressions and functions. We can even perform more complex filtering by combining them with logical conjunctions (ANDs) and disjunctions (ORs).

Using More Than One Transformation Together

In the previous example, we use a single transformation by using a WHERE clause with the region.

We can again use the previous filtered rowset for the new transformation. We can also do it multiple times.

Look at the following example. First, we use transformation to filter the region. Then, we use the DateTime.Parse() method and a conjunction.

@searchlog =
    EXTRACT UserId          int,
            Start           DateTime,
            Region          string,
            Query           string,
            Duration        int?,
            Urls            string,
            ClickedUrls     string
    FROM "/Samples/Data/SearchLog.tsv"
    USING Extractors.Tsv();
@rs1 =
    SELECT Start, Region, Duration
    FROM @searchlog
WHERE Region == "en-gb";
@rs1 =
    SELECT Start, Region, Duration
    FROM @rs1
    WHERE Start >= DateTime.Parse("2012/02/16")
          AND Start <= DateTime.Parse("2012/02/17");
OUTPUT @rs1  
    TO "/output/SearchLog-transform-datetime.csv"
    USING Outputters.Csv();


The second query is operating on the result of the first rowset, which creates a composite of the two filters. We can also reuse a variable name, and the names are scoped lexically.

Hope you like it!

Discover Tarantool's unique features such as powerful stored procedures, SQL support, smart cache, and the speed of 1 million ACID transactions on a single CPU.

Topics:
database ,microsoft azure ,data lake ,u-sql ,select transformation rowsets ,tutorial ,data transformation

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}