Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Handling Simple Denormalized Data From Talend

DZone's Guide to

Handling Simple Denormalized Data From Talend

Many systems store data in a denormalized form, and data integration tools are able handle it. See how to use Talend to showcase handling simple denormalized dataset files.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. Today, some systems may store data in a denormalized form — and data integration tools are able to handle those. In this post, Talend will be used to showcase handling simple denormalized dataset files.

For example, the system stores state data with the following schema: [filed1];[[filed2.1],[filed2.2]] Schema is mapping [StateID];[[StateName],[PostCode]]. Here is the sample file states.csv:

StateID;StateName,PostCode
1;Alabama,009234
2;Alaska,009235
3;Arizona,009236
4;Arkansas,009237
5;California,009244
6;Colorado,009245
7;Connecticut,009214
8;Delaware,009278
9;Florida,0092897
10;Georgia,009247

Start Development in Talend Studio

  1. Drop the following components from the Palette onto the design workspace: tFileInputFullRow, tExtractDelimitedFields, and tLogRow.
  2. Connect them using the Row Main links.

image

Configuring the Components

  1. Double-click the tExtractDelimitedFields component to open its Basic settings view. Add the file path and skip the header line.

    image

    Update the schema as shown below:

    image

  2. Double-click the tFileInputFullRow component to open its Basic settings view. Edit the schema.

    image

  3. Double-click the tLogRow component to open its Basic settings view. Edit the schema.

    image

Running

  1. Save it and press F6:

image

And that's it!

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,talend ,tutorial ,denormalized data ,data analytics ,data integration

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}