Over a million developers have joined DZone.

MapDB: The Agile Java Data Engine

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

MapDB is a pure Java database, specifically designed for the Java developer. The fundamental concept for MapDB is very clever yet natural to use: provide a reliable, full-featured and “tune-able” database engine using the Java Collections API.

MapDB 1.0 has just been released, this is the culmination of years of research and development to get the project to this point. Jan Kotek, the primary developer for MapDB, also worked on predecessor projects (JDBM), starting MapDB as an entire from-scratch rewrite. Jan’s expertise and dedication to low-level debugging has yielded excellent results, producting an easy-to-use database for Java with comparable performance to many C-based engines.

What sets MapDB apart is the “map” concept. The idea is to leverage the totally natural Java Collections API – so familiar to Java developers that most of them literally use it daily in their work. For most database interactions with a Java application, some sort of translator is required. There are many Object-Relational Mapping (ORM) tools to name just one category of such components. The goal has always been in the direction of making it natural to code in objects in the Java language, and translate them to a specific database syntax (such as SQL). However, such efforts have always come up short, adding complexity for both the application developer and the data architect.

When using MapDB there is no object “translation layer” – developers just access data in familiar structures like Maps, Sets, Queues, etc.  There is no change in syntax from typical Java coding, other than a brief initialization syntax and transaction management. A developer can literally transform memory-limited maps into a high-speed persistent store in seconds (typically changing just one line of code).

A MapDB Example

Here is a simple MapDB example, showing how easy and intuitive it is to use in a Java application:

// Initialize a MapDB database
DB db = DBMaker.newFileDB(new File("testdb"))
// Create a Map:
Map<String,String> myMap = db.getTreeMap(“testmap”);

// Work with the Map using the normal Map API.
myMap.put(“key1”, “value1”);
myMap.put(“key2”, “value2”);

String value = myMap.get(“key1”);

That’s all you need to do, now you have a file-backed Map of virtually any size.

Note the “builder-style” initialization syntax, enabling MapDB as the agile database choice for Java. There are many builder options that let you tune your database for the specific requirements at hand. Just a small subset of options include:

  • In-memory implementation
  • Enable transactions
  • Configurable caching

This means that you can configure your database just for what you need, effectively making MapDB serve the job of many other databases. MapDB comes with a set of powerful configuration options, and you can even extend the product to make your own data implementations if necessary.

Another very powerful feature is that MapDB utilizes some of the advanced Java Collections variants, such as ConcurrentNavigableMap. With this type of Map you can go beyond simple key-value semantics, as it is also a sorted Map allowing you to access data in order, and find values near a key. Not many people are aware of this extension to the Collections API, but it is extremely powerful and allows you to do a lot with your MapDB database (I will cover more of these capabilities in a future article).

The Agile Aspect of MapDB

When I first met Jan and started talking with him about MapDB he said something that made a very important impression: If you know what data structure you want, MapDB allows you to tailor the structure and database characteristics to your exact application needs. In other words, the schema and ways you can structure your data is very flexible. The configuration of the physical data store is just as flexible, making a perfect combination for meeting almost any database need.

They key to this capability is inherent in MapDB’s architecture, and how it translates to the MapDB API itself. Here is a simple diagram of the MapDB architecture:

MapDB Architecture

As you can see from the diagram, there are 3 tiers in MapDB:

  • Collections API: This is the familiar Java Collections API that every Java developer uses for maintaining application state. It has a simple builder-style extension to allow you to control the exact characteristics of a given database (including its internal format or record structure).
  • Engine: The Engine is the real key to MapDB, this is where the records for a database – including their internal structure, concurrency control, transactional semantics – are controlled. MapDB ships with several engines already, and it is straightforward to add your own Engine if needed for specialized data handling.
  • Volume: This is the physical storage layer (e.g., on-disk or in-memory). MapDB has a few standard Volume implementations, and they should suffice for most projects.

The main point is that the development API is completely distinct from the Engine implementation (the heart of MapDB), and both are separate from the actual physical storage layer. This offers a very agile approach, allowing developers to exactly control what type of internal structure is needed for a given database, and what the actual data structure looks like from the top-level Collections API.

To make things even more extensible and agile, MapDB uses a concept of Engine Wrappers. An Engine Wrapper allows adding additional features and options on top of a specific engine layer. For example, if the standard Map engine is utilized for creating a B-Tree backed Map, it is feasible to enable (or disable) caching support. This caching feature is done through an Engine Wrapper, and that is what shows up in the builder-style API used to configure a given database. While a whole article could be written just about this, the point here is that this adds to MapDB’s inherent agile nature.

By way of example, here is how you configure a pure in-memory database, without transactional capabilities:

// Initialize an in-memory MapDB database 
// without transactions
DB db = DBMaker.newMemoryDB()

// Create a Map:
Map<String,String> myMap = db.getTreeMap(“testmap”);

// Work with the Map using the normal Map API.
myMap.put(“key1”, “value1”);
myMap.put(“key2”, “value2”);

String value = myMap.get(“key1”);

That’s it! All that was needed was to change the DBMaker call to add the new options, everything else works exactly the same as in the example shown earlier.

Agile Data Model

In addition to customizing the features and performance characteristics of a given database instance, MapDB allows you to create an agile data model, with a schema exactly matching your application requirements.

This is probably similar to how you write your code when creating standard Java in-memory structures.  For example, let’s say you need to lookup a Person object by username, or by personID. Simply create a Person object and two Maps to meet your needs:

public class Person {

private Integer personID;
private String username;

// Setters and getters go here


// Create a Map of Person by username.
Map<String,Person> personByUsernameMap = ...

// Create a Map of Person by personID.
Map<Integer,Person> personByPersonIDMap = ...

This is a very trivial example, but now you can easily write to both maps for each new Person instance, and subsequently retrieve a Person by either key.

Another interesting concept with MapDB data structures are some key extensions to the normal Java Collections API. A common requirement in applications is to have a Map with a key/value, and in addition to finding the value for a key to be able to perform the inverse: lookup the key for a given value. We can easily do this using the MapDB extension for bi-directional maps:

// Create a primary map
HTreeMap<Long,String> map = DBMaker.newTempHashMap();

// Create the inverse mapping for primary map
NavigableSet<Fun.Tuple2<String, Long>> inverseMapping = 
  new TreeSet<Fun.Tuple2<String, Long>>();

// Bind the inverse mapping to primary map, so it is auto-updated each time the primary map gets a new key/value
Bind.mapInverse(map, inverseMapping);


// Now find a key by a given value.
Long keyValue = Fun.filter(inverseMapping.get(“value2”);

MapDB supports many constructs for the interaction of Maps or other collections, allowing you to create a schema of related structures that can automatically be kept in sync. This avoids a lot of scanning of structures, makes coding fast and convenient, and can keep things very fast.

Wrapping it up

I have shown a very brief introduction on MapDB and how the product works. As you can see its strengths are its use of the natural Java Collections API, the agile nature of the engine itself, and the support for virtually any type of data model or schema that your application needs.  MapDB is freely available for any use under the Apache 2.0 license.

To learn more, check out: www.mapdb.org.

Learn how you can modernize your data warehouse with Apache Hadoop. View an on-demand webinar now. Brought to you in partnership with Hortonworks.


The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}