Should You Use DynamoDB? (Part 1)
Learn in what cases DynamoDB can be used efficiently and what pitfalls to avoid so that you can answer the question, Should I use DynamoDB in my next project?
Join the DZone community and get the full member experience.
Join For FreeSelecting a proper technology for a new project is always a stressful event. We need to find something that will fit all existing requirements, does not restrict further growth, allows us to achieve necessary performance, does not put a heavy operational burden, etc. It’s only natural that selecting a database can be tricky.
In this article, I would like to describe the DynamoDB database created by AWS. My goal is to give you enough knowledge so you would be able to answer a simple question: “Should I use DynamoDB in my next project?” I will describe in what cases DynamoDB can be used efficiently and what pitfalls to avoid. I hope this will help you to make your life easier.
This series will start with a general overview of DynamoDB. Then, I will show how to structure data for DynamoDB and what options do you have to work with DynamoDB. The series will finish with a rundown of some advanced features of DynamoDB.
What Is DynamoDB?
Let’s start with what is AWS DynamoDB. DynamoDB is a NoSQL, key-value/document-oriented database. As a key-value database, it allows storing an item with an ID and then getting an item back. As a document-oriented database, it allows storing complex nested documents.
DynamoDB is a serverless database, meaning that when you work with it, you do not need to worry about individual machines. In fact, there is even no way to find out how many machines Amazon is using to serve your data. Instead of working with individual servers, you need to specify how many read and write requests your database should process.
On the one hand, this allows Amazon to provide a predictable low latency which is according to many sources is less than 10 ms if a request is coming from an EC2 host from the same AWS region. The latency can be even lower (<1 ms) if you enable cache for DynamoDB (more on this in the later section).
On the other hand, specifying the number of requests instead of a number of servers allows you to concentrate on the business value of the database and not on the implementation details. If you have an estimate of how many requests you need to process, all you need to do is to specify a number in AWS console or perform an API request. This can’t be simpler.
The price that you pay for using DynamoDB is primarily determined by the provisioned capacity for your DynamoDB database. The higher it is, the higher the monthly bill is. DynamoDB also provides a small amount of so-called “burst capacity,” which can be used for a short period of time to read or write more data that your provisioned capacity allows. If you’ve consumed it and still read or write more data, DynamoDB will return a ProvisionedThroughputExceededException
, and you need either to retry an operation or provision more capacity.
In addition to these major features there are few other reasons why you might consider DynamoDB:
- Massive scale: Just like other AWS services, DynamoDB can work on a massive scale. Many other companies such as Airbnb, Lyft, and Duolingo are using DynamoDB in production.
- Low operational overhead: You still need to do some operational tasks, such as ensuring that you have enough provisioned capacity, but most of the operational load is taken by the AWS team.
- Reliable: Despite a few outages DynamoDB has a proven track of being a rock-solid database solution. Also, all data that is written to DynamoDB is replicated to three different locations.
- Schemaless: Just as many other NoSQL databases DynamoDB does not impose strict schema allowing more flexibility.
- Simple API: DynamoDB API is very straightforward. Overall it has less than twenty methods an only a handful of them is related to writing and reading data.
- Autoscaling: It is pretty straightforward to scale DynamoDB database up or down. All you need to do is to enable autoscaling on a particular table, and AWS will automatically increase or decrease provisioned capacity depending on the current load. Alternatively, you can perform the
UpdateTable
API call and change the provisioned capacity. - Integration with other AWS services: DynamoDB is one of the core AWS services and has good integration with other services. You can use it together with CloudSearch to enable full-text search, perform data analytics with AWS EMR, back up data with AWS Data Pipeline, etc.
Data Model in DynamoDB
Now, let’s take a look at how to store data in DynamoDB. Data in DynamoDB is separated into tables. When you create a table, you need to decide on the key type that your table will have. DynamoDB has two types of keys and when you select a key type and you can’t change once it is selected:
- Simple key: In this case, you need to identify what attribute in the table contains a key. This key is called a partition key. With this key type, DynamoDB does not give you a lot of flexibility and the only operation that you can do efficiently is to store an element with a key and get an element by a key back.
- Composite key: In this case, you need to specify two key values which are called partition key and a sort key. As in the previous case, you can get an item by key, but you can also query this data in a more elaborate way. For example, you can get all items with the same partition key, sort result data by the value of the sort key, filter items using the value of the sort key, etc. The pair of partition/sort should be unique for each item.
Let’s take a look at some examples of using simple and composite keys. For example, if we want to store a table with users in our database we could use a table with a simple key and store it like this:
With this table, the only operation that we can perform efficiently is to get a user by ID.
Composite keys allow more flexibility. For example, we could define a table with a composite that stores forum messages and select user id as a partition key and timestamp as a sort key:
This structure would allow performing more complex queries like:
- Get all forum posts written by a user with a specified ID.
- Get all forum posts written by a specified user sorted by time (we can do this because we have the sort key).
- Get all forum posts that were written in a specified time interval (we can do this because we can specify filtering expression on a sort key).
Notice that in this case, you can only query data for a specified partition key. If you want to search for items across partition keys you need to use the scan operation. It allows finding all items in a table that match a specified filter expression. This operation is less restrictive than the first two, but it does not exploit any knowledge about where data is stored in DynamoDB and ends up scanning the whole table.
You should try to use the scan operation as little as possible. While it allows much more flexibility, it is significantly slower and consumes more provisioned capacity. If you are extensively relying on scanning big tables, you won’t achieve high performance and will have to provision more capacity and hence pay more money.
Consistency in DynamoDB
As with many other NoSQL databases, you can select consistency level when you perform operations with DynamoDB. DynamoDB stores three copies of each item and when you write data to DynamoDB it only acknowledges a write after two copies out of three were updated. The third copy is updated later.
When you read data from DynamoDB, you have two options. You can either use strong consistency (in this case, DynamoDB will read data from two copies and return the latest data) or you can select eventual consistency (and in this case, DynamoDB will only read data from one copy at random and may return stale data).
Indexes
Simple key and composite key model is quite restrictive and is not enough to support complex use cases. To help with that DynamoDB supports two index types:
- Local secondary index: Is very similar to composite key and is used to define additional sort order or to filter items by different criteria. The main difference from a composite key is that a pair of a partition/sort key should be unique, but a pair of partition/secondary index should not be unique.
- Global secondary index: This allows using a different partition key for your data. You can use a global secondary index if you want to get an item from a table by one of two IDs, for example, a book in an online shop can have two ids: ISBN-10 and ISBN-13. As with regular tables, global secondary indexes can have simple and composite keys.
Internally, the global secondary index is simply a copy of your original data in a separate DynamoDB table with a different key. When an item is written into a table with a global secondary index, DynamoDB copies data in the background. Because of this, writing data into a global secondary is always eventually consistent.
With DynamoDB, you can create up to five local secondary indexes and up to five global secondary indexes per table.
Complex Data
All examples so far presented data in table format but DynamoDB also supports complex data types. In addition to simple data types like numbers and strings DynamoDB supports these types:
- Nested object: A value of an attribute in DynamoDB can be a complex nested object
- Set: A set of numbers, strings, or binary values
- List: An untyped list that can contain any values
Programmatic Access
If you want to access DynamoDB, you have two main options: low-level API and DynamoDB mapper. All communication with DynamoDB is performed via HTTP. To read data, there are just four methods:
- GetItem: Get a single item by id from a database
- BatchGetItem: Get several items by id in one call
- Query: Query a composite key or an index
- Scan: Scan through a table
And there are just four methods to change data in DynamoDB:
- PutItem: Write a new item to a table
- BatchWriteItem: Write multiple items to a table
- UpdateItem: Update some fields in a specified item
- DeleteItem: Remove an item by ID
All methods work on a table level. There are no methods that work across different tables.
The low-level API is a thin wrapper over these HTTP methods. It is verbose and cumbersome to use. For example, to get a single item from a DynamoDB table you need to write that much code:
// Create DynamoDB client
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
// Create a composite key
HashMap<String, AttributeValue> key = new HashMap<>();
key.put(”UserId”, new AttributeValue()
.withN(”1”));
key.put(”Timestamp”,new AttributeValue().withN(“1498928631”));
// Create a request object
GetItemRequest request = new GetItemRequest()
.withTableName(”ForumMessages”)
.withKey(key);
// Perform API request
GetItemResult result = client.getItem(request);
// Get attribute from the result item
AttributeValue year = result.getItem().get(”Message");
// Get string value
String message = attributeValue.getS();
The code is pretty straightforward. First, we create a key to get an item by a key, then as with other AWS API methods, we create a request object and perform a request. The example is in Java, but API clients for AWS exist for other languages such as Python, .NET platform, and JavaScript.
Now, you may be wondering if there is a library that will help to avoid all this massive amount boilerplate code. And in fact, AWS implemented a high-level library for this called DynamoDB mapper. To use it you first need to define a structure of your data similarly to how you define it with ORM frameworks:
// Specify table name
@DynamoDBTable(tableName=“ForumUsers”)
public class User {
// "UserId" attribute is a key
@DynamoDBHashKey(attributeName=“UserId”)
public int getUserId() {
return userId;
}
// Map this value to the "Name" attribute
@DynamoDBAttribute(attributeName = ”Name”)
public int getName() {
return name;
}
}
Now accessing data in DynamoDB is much simpler. All we need to do to is to create an instance of the DynamoDBMapper
and call the load
method:
// Create DynamoDB client (just as in the previous example)
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
// Create DynamoDB mappper
DynamoDBMapper mapper = new DynamoDBMapper(client);
// Create a key instance
User key = new User();
key.setUserId(1);
// Get item by keys
User result = mapper.load(key);
Notice that now, to specify a key of an item, we can use a Java POJO class and not a HashMap
instance, as in the previous case.
Unless you need to access some nitty-gritty details of DynamoDB or need to implement a custom way of doing things, I recommend to use DynamoDB mapper and to only use a low-level API if necessary. Examples here were provided in Java, but there are also DynamoDB mapper implementations for other languages like .NET platform and Python.
You can also use unofficial libraries to access DynamoDB. For example, if you are using Spring you can consider using Amazon DynamoDB module for Spring Data.
Stay tuned for next time, when we'll talk about advanced features of DynamoDB!
Published at DZone with permission of Ivan Mushketyk, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments