API Consumption drives the importance of API Traffic. Without insight into how your APIs are being consumed, you’re unable to get the analytics for your customers and API usage.
API traffic data has a few characteristics; high frequency, payload sizes, data structure, tables, volume, and objects. Persisting API traffic is important because most services have some sort of rate-limiting and different billing tiers for customers based on their usage. There is usually a threshold for alerting and scaling, so if someone is using more API calls than they should, you can partition them. You also want to be able to provide analytics to your internal organization for the different APIs/resources you are exposing. If your customers are exposing an API, it is a good bet that they are integrating with other services as well, allowing them to slice and dice their data with the data they are consuming from your product.
There are differences in typical API request payloads and response payloads. The request data vs. response data can change what your persistent strategy needs to be. With request payloads, you usually have moderate-to-large payload sizes. About 80% of API requests are GET requests, and POST requests can also be large. The response payloads are also typically large in size. The data structures are indeterministic, meaning when you are trying to run analytics on the data the table can be large and slicing and dicing the data responses can be variable.
Establishing that persisting the data is the best option, most will look to some SQL database, considering that they are most likely already using one. The question is whether a NoSQL database should be considered. SQL, being more than 40 years old, is the primary interface for RDMBS and, having commercial and open-source implementations, it is a strong consideration.
NoSQL, on the other hand, has existed since the 1960s and was the primary storage mechanism before SQL gained popularity. It has gained more traction in last 10 years.
With the comparison of the two technologies below, the main factors in the decision-making process are data growth, online versus archived data, search filter flexibility, search performance, and clustering and sharding. When deciding what the best is for you, consider the measure of your current inbound API traffic, the data-retention policy, estimated data growth, and if your customers need heavy slicing and dicing.
|Relational model with data organized in a tabular structure.||Differenent Model — document, graph, key value.|
|Pre-defined schema definition.||Dynamic schema definition.|
|Typically vertically scalable — higher cost VMs.||Horizontally scalable — lower cost VMs.
|Powerful and standardized query interface.||Query interface varies by provider.|
|Most implementations are ACID-compliant.||Follows CAP (Consistency, Availability, Partitioning) .|
You can use a SQL datastore for:
- Manageable data sizes.
- Low time period or size-based retention policies.
- Low usage frequency.
- Lightweight analytics.
- Query interface needs to be standardized.
You can use a NoSQL for when:
- Scale and volume are important.
- Deep analytics are required.
- Fast queries are paramount.
- You can live with a non-standard query interface