DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • API Analytics: Unleashing the Power of Data-Driven Insights for Enhanced API Management
  • The Rising Risks and Opportunities in API Security
  • OWASP TOP 10 API Security Part 2 (Broken Object Level Authorization)
  • API and Security: From IT to Cyber

Trending

  • Build Your First AI Model in Python: A Beginner's Guide (1 of 3)
  • Teradata Performance and Skew Prevention Tips
  • How to Build Local LLM RAG Apps With Ollama, DeepSeek-R1, and SingleStore
  • A Guide to Container Runtimes
  1. DZone
  2. Data Engineering
  3. Data
  4. How to Properly Leverage Elasticsearch and User Behavior Analytics for API Security

How to Properly Leverage Elasticsearch and User Behavior Analytics for API Security

How to set up Elasticsearch and Kibana for User Behavior Analytics (UBA) in API Security Monitoring — Accurately identify API security vulnerabilities.

By 
Derric Gilling user avatar
Derric Gilling
DZone Core CORE ·
Aug. 06, 20 · Analysis
Likes (6)
Comment
Save
Tweet
Share
7.0K Views

Join the DZone community and get the full member experience.

Join For Free

Kibana and the rest of the ELK stack (Elasticsearch, Kibana, Logstash) is great for parsing and visualizing API logs for a variety of use cases. As an open-source project, it’s free to get started (you need to still factor in any compute and storage cost which is not cheap for analytics). One use case for Kibana that’s grown recently is providing analysis and forensics for API security, a growing concern for engineering leaders and CISO’s as companies expose more and more APIs to their customers, partners, and leveraged by Single Page Apps and mobile apps. This can be done by instrumenting applications to log all API traffic to Elasticsearch. However, a naive implementation would only store raw API logs and calls, which is not sufficient for API security use cases.

Why API logging is a naive approach to API security

Raw API logs only contain the information pertaining to execute a single action. Usually the HTTP headers, IP address, request body, and other information is logged for later analysis. Monitoring can be added by purchasing a license for Elasticsearch X-Pack. The issue is that security incidents cannot always be detected by looking at API calls in isolation. Instead, hackers are able to perform elaborate behavioral flows that exercise your API in an unintended way.

Let’s take a simple pagination attack as an example. A pagination attack is when a hacker is able to paginate through a resource like /items or /users to scrape your data without detection. Maybe the info is already public and low risk such as items listed in an e-commerce platform. However, the resource could also have PII or other sensitive information such as /users, but was not correctly protected. In this case, a hacker could write a simple script to dump all the users stored in your database like so:

skip = 0
while True:
    response = requests.post('https://api.acmeinc.com/users?take=10&skip=' + skip),headers={'Authorization': 'Bearer' + ' ' + sys.argv[1]})
    print("Fetched 10 users")
    sleep(randint(100,1000))
    skip += 10

Couple of things to note:

  1. The hacker is waiting a random time between each call to not run into rate limits
  2. Since the frontend app only fetches 10 users at a time, the hacker only fetches 10 at a time to not raise any suspicion

There is absolutely nothing in a single API call that can distinguish these bad requests vs real requests. Instead, your API security and monitoring solution needs to examine user behaviors holistically. This means examining all the API calls together made by a single user or API key which is called User Behavior Analytics or UBA.

How to implement User Behavior Analytics in Kibana and Elasticsearch

To implement User Behavior Analytics in Kibana and Elasticsearch, we need to flip our time-centric data model around to one that is user-centric Normally, API logs are stored as a time-series using the event time or request time as the date to organize data around. By doing so, older logs can easily be marked read only, moved to smaller infrastructure, or retired based on retention policies. In addition, it makes search fast when you’re only querying a limited time range.

Tagging API logs with user id

In order to convert this to a user-centric model, we need to tag each event with user identifying information such as a tenant id, a user id, or similar. Because the majority of APIs are secured by some sort of OAuth or API Key, it’s fairly easy to map the API key to a permanent identifier like user id either directly or by maintaining this mapping in a key/value store like Redis. This way your logs might look like so:

Request Time Verb Route User Id
2020-08-02T02:14:48Z GET /items 1234
2020-08-02T02:15:49Z GET /items 1234
2020-08-03T02:16:19Z GET /users 6789
2020-08-03T02:24:49Z GET /users 1234

Grouping related API logs together

Now that you have tagged all API logs with a user id, you will need to run a map reduce job to group all a user’s events together and calculate any metrics for each user. Unfortunately, log aggregation pipelines like Logstash and Fluentd can only enrich single events at a time, so you will need a custom application that can run distributed map/reduce jobs on a distributed compute framework like Spark or Hadoop.

Once you group by user id, you’ll want to store a few items in the “user profile” such as:

  1. Id and demographics of the user
  2. The raw events this user has made
  3. Summary metrics like number of API keys or amount of data downloaded this user has done

Storing the user profiles

Even though you are grouping by user id’s, storing all the events into a single database entity would be a no go as that removes the flexibility of time-series data stores including:

  1. Fat entities that contain too much data
  2. Cannot retire old data
  3. Queries become slow due to amount of data touched

To fix this, we can overlay our original time-series architecture with this user-centric approach creating a two-level data model.

User Id Start Time End Time Number of Logins Number of Users Touched Number of API Keys Events
1234 2020-08-02T00:00:00Z 2020-08-02T23:59:59Z 2 250,223 1 []
6789 2020-08-03T00:00:00Z 2020-08-03T23:59:59Z 13 232 12 []
1234 2020-08-03T00:00:00Z 2020-08-03T23:59:59Z 0 323,997 0 []

In this case, we are creating a new “user profile” every day that contains the relevant security metrics along with raw events.

Detecting API security vulnerabilities

Now that we have reorganized our API data to be user centric, it becomes far easier to identify bad actors from good users, whether through visual inspection, static alert rules, or advanced anomaly detection.

In this case, we see the typical user (6789) touched or accessed only 232 users and 12 items. Clearly this looks like standard interactive traffic. On the other hand, we have a bad actor (1234) that has touched or downloaded over 250,000 items per day over the last two days. In addition, he was accessing the API without any corresponding logins on the second day. You can now create infrastructure to detect this programmatically and alert you such as alert when any user “touched over 10,000 items in a single day.” API security and monitoring solutions like Moesif already have this functionality built in.

How long to retain API logs for API security

Unlike API logs for debugging purposes, these entities should be stored for at lest a year since most breach studies demonstrate the time to detect a data breach is over 200 days. If you’re only retaining your API data for a couple of days or weeks to keep cost down, then you lose access to valuable forensics data needed for auditing and postmortem review. Treat your API data like your database backups in that you never know when you might need them and should regularly test your system to ensure the right data is captured. Most security experts recommend retaining API logs for at least a year. Naive decision making places too much emphasis on reducing storage and compute cost without considering how much risk he or she is subjecting their company to.

Storing data for a year complicates your GDPR and CCPA compliance since API logs can contain PII and personal data. Luckily, GDPR and CCPA have placed exemptions for collecting and storing logs without consent for the legitimate purpose of detecting and preventing fraud and unauthorized system access, and ensuring the security of your APIs. In addition since you already tied all API logs to individual users, handling GDPR requests such as right to erasure or right to access is a breeze.

API User behavior analytics security Database Elasticsearch Data (computing) Analytics

Published at DZone with permission of Derric Gilling. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • API Analytics: Unleashing the Power of Data-Driven Insights for Enhanced API Management
  • The Rising Risks and Opportunities in API Security
  • OWASP TOP 10 API Security Part 2 (Broken Object Level Authorization)
  • API and Security: From IT to Cyber

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!