DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • MuleSoft IDP: Enhancing Efficiency and Accuracy in Data Extraction
  • Exactly-Once Processing: Myth vs Reality
  • Enhancing SQL Server Performance with Query Store and Intelligent Query Processing
  • Why Queues Don’t Fix Scaling Problems

Trending

  • Phantom APIs Are Eating Your Attack Surface, and Most Security Teams Are Still Looking the Other Way
  • A Hands-On ABAP RESTful Programming Model Guide
  • Generative Engine Optimization: How to Make Your Content Visible to AI
  • Delta Sharing in Action: Securely Share Data Across Organizations With Databricks

Mule 4: Processing Multibyte Characters in Fixed-Width Flat Files

A custom solution to handle multibyte characters for fixed-width flat files in Mulesoft. Dataweave can process several different types of data formats.

By 
Ravneet Bhardwaj user avatar
Ravneet Bhardwaj
·
Mar. 19, 21 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
12.1K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

Dataweave can process several different types of data formats like flat files, copybooks, fixed-width files, etc. For most of these types, you can import a schema that describes the input structure in order to have access to valuable metadata at design time.

Problem Statement

These schemas currently only work with certain single-byte character encodings. All foreign languages — Spanish, Japanese, French, etc. are of multibyte size unlike English, where 1 character = 1 byte.

Mulesoft doesn’t have the capability to handle multibyte characters. While processing, the mule runtime considers each character present in the fixed-width flat file of 1 (one) byte.

Thus, if there is any multibyte character present in the flat file it couldn’t be parsed correctly.

Solution Brief

Conversion of flat files with multibyte characters, to single byte, thus ensuring correct parsing while it is processed in Mulesoft.

Optional conversion back to the original structure while exporting the file to external systems.

Prerequisites

Anypoint Studio 7.6, Mule Runtime 4.3.0, Knowledge of Flat File Schemas.

Problem

For the purpose of demonstration, I am using the Flat File Definition (.ffd) as given below:

YAML
 




x
11


 
1
form: FIXEDWIDTH
2
id: 'flatfile'
3
name: 'flatfile'
4
values: 
5
- { name: 'Id', usage: M, type: String, length: 2 }
6
- { name: 'FirstName', usage: M, type: String, length: 10 }
7
- { name: 'LastName', usage: M, type: String, length: 10 }
8
- { name: 'City', usage: M, type: String, length: 10 }
9
- { name: 'State', usage: M, type: String, length: 10 }
10
- { name: 'Country', usage: M, type: String, length: 10 }
11
 
          



As per the schema definition, the FirstName length should be size 10 (ten). When a fixed width flat file is created by ERP systems, each line is written by bytes and not characters. For example, the Japanese characters are of 3 (three) bytes in UTF-8 format. They will occupy only one space in a file but in actual as per the ERP system, it is 3 spaces.

Plain Text
 




x


 
1
日 -> 3 bytes
2
本 -> 3 bytes
3
語 -> 3 bytes
4
日本語 -> 9 bytes
5
ABC -> 3 bytes (English Alphabets 1 character = 1 byte)



The process flow below is used for processing fixed-width flat file:

fixed-width-files-process


MIME Type > Add Parameter

The fixed-width flat file to be processed is as below:

Plain Text
 




xxxxxxxxxx
1


 
1
1 Ravneet   Bhardwaj  Gurugram  Haryana   India     
2
2 日本語 Bhardwaj  Gurugram  Haryana   India     
3
 
          



Here the first line contains single-byte characters so Mulesoft will be able to handle them without any problems, but in the second line for FirstName certain Japanese characters are passed. These characters occupy only 3 spaces but in actual these are 9 bytes. As per the FFD schema, the FirstName should be of size 10 (ten). While processing, Mulesoft will throw an exception.


Since Mulesoft considers all characters as single bytes, it throws an exception that the expected size is not met for a particular field. In this case, it's the FirstName.

Solution

A custom Java utility that will read each character of the flat file, line by line. As soon a multibyte character is encountered, the custom solution appends extra spaces next to each multibyte character. 

The number of spaces added is dynamic, based on the size of the multibyte character.

MyUtility.java

Java
 




xxxxxxxxxx
1
72


 
1
package com.ravneet.utility;
2
 
          
3
import java.io.BufferedReader;
4
import java.io.IOException;
5
import java.io.InputStream;
6
import java.io.InputStreamReader;
7
 
          
8
public class MyUtils {
9
 
          
10
    public static String convertflatfile(Object input) {
11
        StringBuilder builder = new StringBuilder();
12
 
          
13
        try {
14
 
          
15
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader((InputStream) input)) {
16
 
          
17
            };
18
 
          
19
            String fileOutput;
20
            while ((fileOutput = bufferedReader.readLine()) != null) {
21
                String[] arr = fileOutput.split("");
22
 
          
23
                for (int i = 0; i < arr.length; i++) {
24
                    builder.append(arr[i]);
25
                    if (arr[i].getBytes().length > 1) {
26
                        int bytelength = arr[i].getBytes().length;
27
 
          
28
                        while (bytelength != 1) {
29
                            builder.append(" ");
30
                            --bytelength;
31
 
          
32
                        }
33
                    }
34
 
          
35
                }
36
                builder.append("\n");
37
            }
38
 
          
39
            //System.out.println(builder.toString());
40
 
          
41
        } catch (IOException e) {
42
            // TODO Auto-generated catch block
43
            e.printStackTrace();
44
        }
45
 
          
46
        return builder.toString();
47
    }
48
    
49
    
50
    public static String removingSpace(String input) {
51
 
          
52
        String[] arr = input.split("");
53
        StringBuilder builder = new StringBuilder();
54
        for (int i = 0; i < arr.length; i++) {
55
            builder.append(arr[i]);
56
            if (arr[i].getBytes().length > 1) {
57
                int bytelength = arr[i].getBytes().length;
58
 
          
59
                while (bytelength != 1) {
60
                    --bytelength;
61
                    i++;
62
                }
63
            }
64
 
          
65
        }
66
        //System.out.println(builder.toString());
67
        return builder.toString();
68
 
          
69
    }
70
 
          
71
}
72
 
          


 

The above Java utility has two functions: convertflatfile and removingSpace.

  • convertflatfile — This is a preprocessor that reads the flat file and converts the multibyte characters to single-byte characters.
  • removingSpace — This is a post-processor that removes the extra spaces added by the preprocessor.

Below is the modified process flow:

Modified Process Flow

While reading the file, the below property should be selected:

Streaming strategy Property

The Set Payload component will contain the FFD schema details.

Set Payload Component

In the post-processing, the spaces are removed:

Spaces Removed Post-processing


Output:


Conclusion

The above solution will enable Mulesoft to process and parse flat files with multibyte characters. This is an interim solution until Mulesoft provides an out-of-the-box functionality for the problem.

Flat (geometry) Processing

Opinions expressed by DZone contributors are their own.

Related

  • MuleSoft IDP: Enhancing Efficiency and Accuracy in Data Extraction
  • Exactly-Once Processing: Myth vs Reality
  • Enhancing SQL Server Performance with Query Store and Intelligent Query Processing
  • Why Queues Don’t Fix Scaling Problems

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook