Over a million developers have joined DZone.

The Seven Most Popular APIs in Big Data – Part 1

DZone's Guide to

The Seven Most Popular APIs in Big Data – Part 1

· Integration Zone ·
Free Resource

WSO2 is the only open source vendor to be named a leader in The Forrester Wave™: API Management Solutions, Q4 2018 Report. Download the report now or try out our product for free.

Part One: An Overview of the Current Situation

It was only a few years ago when nearly everyone relied on SQL exclusively to tackle Big Data needs, but as the demand for speed and space increases, so have our options. Now there are a number of new data systems that are mostly based around NoSQL, with each of them having been developed to best serve specific areas.
In this post, we'll be taking a look at seven APIs in particular and explore how these systems can be optimized for maximum speed and memory capabilities.

The Seven Most Popular APIs in Big Data

1. SQL: The term “SQL” may suggest that this API is no longer relevant in this new data world, but most of NoSQL implementations support a major subset of SQL. This system is able to provide a rich set of query and data management and is often the least common denominator of many data management systems.

SQL Code Example

2. Document: The document API allows users to write a different structure of fields to the same logical table without any need for schema evolution (which is why document API is often known as “schema less”). It's one of the most popular for web applications that uses the JASON data model.

Document API-MongoDB

3. Object Graph: This navigation API is most suitable for hierarchical data structures (i.e. social graphs).

Neo4J API Example

4. Tuple API: Tuples are one of the most common APIs for messaging and stream processing use cases. This API represents a simple data structure that is able to map into a flat data object. It’s often based on using the same tuple structure that was used to write the data instance as the query language, meaning that the tuple acts as a “mask” that indicates which instance type and matching fields are to be selected.

JavaSpace code example

5. Key/Value: Key/Value represents the simplest form of data structures. As the name suggests, it consists of a single index per data object. This API is the most popular for caching and is often used as the underlying data structure of more advanced data management solutions.

Key Value Example

6. Stream Based: This event processing model is the most suitable for handling any scenarios where continuous updates will be necessary. It's a popular API for real-time analytics scenarios, which explains why it's becoming increasingly popular for Big Data systems that rely heavily on incremental updates, but which don't require the locking of a large set of data.

Stream Based Example

7. Map/Reduce: This API is used to perform aggregation on distributed data. The Map/Reduce model is able to break up aggregation operations into two or more phases. Map executes the aggregation in each data node, and Reduce takes all of the sub aggregations from each node and then reduces them into one consolidated result. Operations such as calculating max, average, and mean is an example of the Map/Reduce Model.

Map Reduce Example

Which System Is Best for Me?  

There is no “one-size fits all” approach when it comes to data management systems. Because most of today's data management systems have an API which is tied to a data model in which the data is stored, we can't write data in one API and read it with another.
This means that if you want to use that same data for a different purpose, one would need to maintain copies of that data to match each use case API and data store. As such, a typical application would need to include a combination of various data management solutions with complex data flows between them, as illustrated below:

More Realistic Scenario

 But does it really have to be that complex?

Stay tuned for “The Seven Most Popular APIs in Big Data—Part Two” to find out!

IAM is now more than a security project. It’s an enabler for an integration agile enterprise. If you’re currently evaluating an identity solution or exploring IAM, join this webinar.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}