Over a million developers have joined DZone.

Azure Data Lake With U-SQL: Using C# Code Behind

DZone's Guide to

Azure Data Lake With U-SQL: Using C# Code Behind

In this article, we'll discuss how to use C# code behind a U-SQL script, as I've noticed that a lot of U-SQL scripts have C# code on the backend.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.


In this article, we are trying to discuss using C# code behind a U-SQL script, as I've noticed that a lot of U-SQL scripts have C# code on the backend.

Now the question is why we are going to use this C# code, as we can create functions, stored procedures, etc., successfully in U-SQL. The answer is quite simple. We want to use the power of C# and the libraries related to it.

For an example, we need to create a complex scalar value function and, using C#, it is quite easy to do using the built-in math library functionality. 

Case Study

To understand the C# code behind our data lake, we are not looking at any complex examples. Here, we have a CSV file, that has data for: “StudentID”; “StidentName”; “Marks1”; “Marks2”; “Marks3.”

We are going to retrieve information from CSV file and try to put the information into another output CSV file.

We are doing little transformation work by adding “Marks1,” “Marks2,” and “Marks3” and giving a set of “Total Marks” data. 

We are going to use C# code to create a function named GetTotalMarks. It takes three input marks and returns the total of three input marks.

C# Code Behind

using Microsoft.Analytics.Interfaces;
using Microsoft.Analytics.Types.Sql;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;

namespace TestApplication
    public static class StudentRecor
        public static Double GetTotalMarks(int marks_1, int marks_2, int marks_3)
            return marks_1 + marks_2 + marks_3;

U-SQL Script

    EXTRACT StudentID     int,
            StudentName   string,
            Marks1        int,
            Marks2        int,
            Marks3        int
    FROM "C:/Users/Joydeep/AppData/Local/USQLDataRoot/Input-1/StudentRecords.csv"
    USING Extractors.Csv();

@filtering =
    SELECT StudentID,
           TestApplication.StudentRecor.GetTotalMarks(Marks1, Marks2, Marks3) AS TotalMarks
    FROM @searchlog;

OUTPUT @filtering 
    TO "C:/Users/Joydeep/AppData/Local/USQLDataRoot/output/Output-1/StudentResult.csv"
    USING Outputters.Csv();

Please look at the calling of the function in the U-SQL code.

It is :

<Name Space Name> . <Class Name>.<Function Name>

Job Graph

Image title

 Output File

 Image title

Hope this helps!

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

data lakes ,big data ,c# ,u-sql ,azure

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}