How to Implement the File Upload Process in a REST API

Learn how to perform system integration in a REST API through file sharing, using resources provided by Mule components.

Thiago Santana

Aug. 31, 17 · Tutorial

Likes (4)

Comment

Save

72.5K Views

Introduction - An Implementation Challenge

File-sharing is one of the most elementary ways to perform system integration. In the context of web applications, we call "upload" the process where a user sends data/files from a local computer to a remote computer.

Sometimes we need to expose in our REST API an upload operation that allows to transmit:

Binary files (of any kind).
The meta information related to it (eg, name, content type, size and so on...).
Perhaps some additional information for performing business logic processing.

But we would like all this information to arrive on the server in the same request. Sounds like a challenge, doesn't it?

This presentation will demonstrate a strategy, among several existing ones, to implement this integration scenario using resources provided by the Mule components.

SOAP Legacy - Identifying Reusable Concepts

Just to contextualize, this strategy was inspired by the architecture of a legacy project that I needed to maintain. It aimed to perform transmission of binary files in SOAP Web Services, without the use of MTOM.

When we do not intend to use MTOM with Soap, where the files are transmitted as MIME attachment of the payload (similar to the process of sending emails containing attachments [see fig.1]), the implementations of the protocol convert the contents of the file to be transmitted and the result of this conversion is a binary string in the Base64 format.

[fig1]

After the conversion, the String is deposited inside an XML tag, which is part of the Body content of the payload being transmitted to the remote server [see fig2].

[fig2]

There is an advantage in using this strategy, which is to be able to transmit this tag in the same payload that may contain other information, such as the data needed to process business rules, or better still contain the meta information of the file. Since all the necessary information is in the same payload, the process of reading and/or parsing the message is simplified. All this in a single request handled by the service, which is very important for the processing economy and feature maximization of not having to maintain and/or manage of transactions handled by the service, in other words, the Stateless behavior.

This is very desirable because it conforms to one of the great features of the HTTP protocol (which was designed to be Stateless) and allows the transaction to be completely processed in a one same request, as it will not be necessary to request additional information for the service consumer, plus, we can still have part of the treatment process being done internally asynchronously, without prolonging the client response time (timeout). Finally, since there is no need to request additional information regarding the transaction in progress, it provides server resource savings as there will be no need for further processing.

Thinking on some of the gains identified in the architecture applied in the legacy solution, a way of porting this architecture to a REST API was developed, applying also significant improvements.

Implementation - Porting the Solution to REST API

When defining any type of API, we should consider that, when making use of any interface, consumers usually expect to find availability, simplicity, and stability. Defining a RAML contract is one of the ways to establish guidelines that favor the construction of a REST API that, from its initial version, intends to offer simplicity and stability. Just as we do with WSDLs, when we define and make available a RAML, we give our consumer a better chance to prepare for it, as well as identify difficulties and provide suggestions for improvements for future versions of the contract.Using ready-made infrastructures, such as the Mulesoft API Design cloud platform, the consumer can also test this RAML by generating an endpoint with mock information (provided in the RAML itself), testing the interface in its consuming part. All this before we have made available a minimal implementation to run on some server.

To implement the above-proposed scenario, we define the RAML contract containing the POST operation for the resource "file," something like:

For a resource like this, we can have the following HTTP request example:

I explain below the flows that make up the Mule project of the REST API, which is running as Runtime Mule Server 3.8.1 EE

Main Flow:

It consists of an HTTP inbound endpoint configured to handle requests on port 8081, an APIkit Router, and a Reference Exception Strategy.

Post Flow:

The APIkit Route sends HTTP Post requests to this stream. It consists of two triggers (Flow References) for the buildFileData stream and filesSplitter, a DataWeave to prepare the response that is returned to the consumer, and a Logger to register on the console that response.

Build FilesData Flow:

Declaration of the FilesContent variables (a Java HashMap) and FilesData (a simple String var). A ForEach traverses the inboundAttachment of the current message (a multipart-formData) to separate the FileContent [] from the filesData. There is also a component to remove the rootMessage variable generated during the interaction of the ForEach, and lastly, the DataWeave that performs the merge of the original FilesData Payload with the FilesContent data. This is done here to simplify for the consumer the activity of passing binary data within a JSON. This DW also disregards any past FileContent [] that does not be declared as used in the "fileContentIndex" field in the received FilesData JSON, thus filtering the information and avoid the use of invalid data.

Files Splitter Flow:

Converts the received JSON FilesData to a built-in Mule Java object. This object is partitioned, each part representing a file that will be sent to the outbound VM component (where each file is processed separately). The Collection Aggregator component is responsible for gathering the enriched payloads in the same structure.

Upload File Flow:

As suggested by the documentation, we can use an inbound VM to handle requests originating from a Message Splitter. Within this flow, each file is preprocessed to be sent to Amazon S3 by the Create Object operation. The pre-process consists of converting the Base64 binary String from the "fileContent" field of the payload into a BiteArray. The result of this conversion (here performed by the base64-decoder component) is stored in a variable.

The Create Object component is then satisfied with this variable along with the file and bucket names. As this operation doesn't give back a usable response, we add a call to Get Object component that allows us to retrieve data from the file that was stored in the storage.

Finishing, with the help of a DW the data received at the beginning of this flow is enriched with the FileSize and HttpURI data that was received in the response of the S3 Get Object operation.

Exception Mapping Flow:

Mule automatically generates this flow when we create a project, providing some RAML file to the APIKit. It offers some exception treatments handled automatically by APIKit.

Testing the Solution

Most simple REST API tests, especially those involving GET operations, can easily be done by command line utilities, for example, cURL. Stress tests can be generated with the help of the JMeter tool. Tests can also be performed on newer versions of SoapUI.

As it offers several important facilitators, I opted to perform the tests with the help of the Postman tool (an extension for the Google Chrome browser).

As you can verify, we can submit more than one fileContent[] as form param, but only the indexes reported as being used in the JSON passed in the form param "filesData" will be processed by API, all others will be ignored. If no exception happens, we receive a JSON containing the processed data and HTTP status 201.