{{announcement.body}}
{{announcement.title}}

How to Best Monitor GraphQL APIs

DZone 's Guide to

How to Best Monitor GraphQL APIs

Since its release in 2015, GraphQL has become the alternative to REST. It finally gives frontend developers the flexibility they have wanted for so long.

· Integration Zone ·
Free Resource

Since its release in 2015, GraphQL has become the alternative to REST. It gives frontend developers the flexibility they had craved for for so long.

Over are the days of begging backend developers for one-purpose-endpoints. Now a query can define all the data that is needed and request it in one go, cutting latency down considerably.

With REST, things were entirely simpler—especially monitoring. The backend team can measure every endpoint and see what's happening right off the bat.

With GraphQL, this isn’t the case. There's often only one endpoint, so measuring per endpoint isn't possible. So where are the new places to hook into the system—let's find out in this article how to monitor GraphQL.

GraphQL Architecture

To get an idea of where the interesting points are in our system, let’s look into potential architectures.

A simple GraphQL system consists of three parts:

  1. A schema, that defines all of the data-types
  2. A GraphQL engine, that uses the schema to route every part of a query to a resolver
  3. One or more resolvers, which are the functions that get called by the GraphQL engine

The GraphQL backend starts by parsing the schema, which gives the server the knowledge about which type is handled, by which resolver.

When a query is sent to the GraphQL endpoint it gets parsed by the engine and, for every requested type in the query, the engine calls the resolver to satisfy the request.

As we can imagine, this approach only delivers superior performance when used with simple queries.

Sometimes parts of the query can be interconnected to our data-sources (data-source means something like a database or third-party API). For example, if we load a user account and its address, they could be two types in the GraphQL schema, but only one record in the data source. If we request both at the same time, then we wouldn’t expect the server to make two requests to the data source.

To get rid of this problem, people started to use a pattern called data-loader.

A data-loader is another layer in our GraphQL API that resides between our resolvers and data-source.

In the simple setup, the resolvers would access the data-source directly. In the more complex iteration, the resolvers would tell a data-loader what they need, and this loader would access the data-source for them.

Why does this help?

The data-loader can wait until all resolvers have been called and consolidate access to the data source.

If someone wants to load the user account and address, it's only one request to the data-source now.

The idea is that a resolver only knows about its requirements, the data-loader knows what all resolvers want and can optimize the access.

Monitoring GraphQL

As we can see, depending on our architecture, there can be multiple places in which we can monitor our GraphQL API:

  1. HTTP endpoint
    • For all the traffic that hits our API
  2. GraphQL query
    • For each specific query
  3. GraphQL resolver or data-loader
    • For each access to the data source
  4. Tracing
    • Following each query, to the resolvers and data-loaders, they affect

1. HTTP Endpoint

The HTTP endpoint is what we monitored for a REST API. In the GraphQL world there is often only one, so monitoring on this level only gives us information about the overall status of the API.

This isn’t bad. At least it gives us a starting point. If everything is correct here - low latency, low error rates, no customer complains, all green - then we can save time and money by only looking at these metric.

But if something's off, then we'll need to dig deeper.

2. GraphQL Query

The next obvious step would be to look at each query, which might be good enough for APIs that have static usage patterns.

If we use our API only with our own clients, it’s often clear that the queries won’t change that often. However, if our API is available to different customers with different requirements, things won't be that simple.

We may then find hundreds of (slightly) different queries, that all run slow for some reason or another.

One way to negate this issue is to check the most common queries and monitor them synthetically. This means we define a bunch of query and variable combinations, and then run them from test clients to check how long they take. That way, we can reduce the risk of creating significant performance regressions when we update. Persisted queries can help with this, since they can cache the most used queries.

If things that doesn't resolve our issues, then we need to take another step.

3. Resolvers & Data-Loaders

The best place to monitor what’s happening is often where the rubber hits the road. If we look at the places in our backend that access the data-source, we can get a better perspective of what's going on.

Is the type of data-source we used merely wrong for the access pattern, or do we need a different type of database?

Is our data-source type okay, but we need to improve our requests to them? Do we need something like a data-loader if we didn’t already use one?

Do we send requests to external APIs that are too slow? Can we replicate that data closer to our backend?

All of these questions can be answered when we see what data is retrieved in our backend.

There's an ancillary benefit of the data-loader. The resolvers only allow us to monitor what one resolver does; the data-loader allows us to see what all resolvers do in one request, and additionally allows us to solve inter-resolver-problems after we discover them.

4. Tracing The Whole Stack

This is the supreme discipline of monitoring. It involves tagging the query with a tracing-ID and then passing this ID after it's translated to resolvers, and then onto data-loaders, or maybe even to the data-source itself. This allows us to use the tracing-ID when logging timings and errors, so we can consolidate them later to get a big picture perspective.

The idea is that while measuring a query we might get data on how long it took to resolve, but the actual data loading is done down the line in the resolvers and/or data-loaders, and not while parsing the query.

We don’t work with queries anymore when loading the data since one of the core ideas of GraphQL is decoupling the queries from the actual data loading. But it's still great to see what happens in the background when someone sends a query.

Conclusion

Understanding how the backend for a GraphQL API is structured gives us more actionable ideas on where to monitor.

Things have certainly become a bit more cumbersome than with REST APIs, but there is nothing magical going on in a GraphQL API—it’s just code we can hook into for different purposes, like monitoring.

If we gain visibility into our production system, questions about caching and error handling also become clearer.

Topics:
analytics, api, api as a product, api best practices, building apis, graphql, graphql api, graphql schema, graphql vs rest, monitoring and alerting

Published at DZone with permission of Kay Ploesser . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}