DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Reliability Models and Metrics for Test Engineering
  • Data-Driven Decision-Making in Product Management: The Key to Success
  • Decoding the Confusion Matrix: A Comprehensive Guide to Classification Model Evaluation
  • Using A/B Testing To Make Data-Driven Product Decisions

Trending

  • Scaling Microservices With Docker and Kubernetes on Production
  • Rust, WASM, and Edge: Next-Level Performance
  • Chat With Your Knowledge Base: A Hands-On Java and LangChain4j Guide
  • Monolith: The Good, The Bad and The Ugly
  1. DZone
  2. Data Engineering
  3. Data
  4. Monitoring with DataDog

Monitoring with DataDog

By 
Giorgio Sironi user avatar
Giorgio Sironi
·
Apr. 10, 13 · Interview
Likes (1)
Comment
Save
Tweet
Share
16.0K Views

Join the DZone community and get the full member experience.

Join For Free

Recently I found myself sending more and more business metrics to Datadog, a Software as a Service solution that promises to collect all your data points and build business metrics, displaying them as graphs and triggering alerts whenever they get to critically low (or high) levels.

The goals

The more your automated tests raises their level of abstraction, the more they become oriented to external quality (what the customer wants and does) instead of internal quality (low coupling, high cohesion of the software design). The largest end-to-end tests that we have in place at Onebip connect several different projects on an integration server and run everything from the creation of a purchase or subscription to its renewal and termination (events that would happen months after creation).

However, even end-to-end tests cannot guarantee that our applications work against external resources, such as merchants, mobile carrier, and ISPs. The only way to catch integration problems is monitoring. These problems, like a mobile carrier experiencing an outage, may be due to our errors or to external conditions; but they should nevertheless be discovered as early as possible.

The infrastructure

Datadog is the only data-collection service that passed the stress tests of SLL, our solution architect. It ships as an UDP server that you pay basing on the number of machines you want to run it on; for example, a preproduction and a production server are a common choice to start out. The server collects data locally and periodically uploads it to Datadog in bursts, where you can access it via a web application or via APIs in case you want to call it from your build.

The UDP protocol is aligned with the goals of metric collections: a silent server that decouples the sending of metrics from the rest of the business logic:

  • UDP packets are just lost if no process is there listening to them, no errors are raised if the server crashes or is not running or installed for some reason for instance in development machines).
  • The monitoring code, which you write, should be decoupled and asynchronous as much as possible. The part that talks over the network is already externalized in the DataDog server, but you don't want the user to wait because you have to send some strange number.

So the internal part (sending via UDP) is performed in Listener objects that implement the Observer pattern. These object still have to be wrapped in all-encompassing try/catch constructs so that any errors in the monitoring part never influence the business logic. Againg, you don't want a payment to fail because of an exception in how monitoring DateTime objects are built.

For PHP we built a SilentListener class to wrap all of our object:

class SilentListener
{
  private $wrapped;

  public function __construct($wrapped)
  {
      $this->wrapped = $wrapped;
  }

  public function __call($method, $args)
  {
      try {
          call_user_func_array(array($this->wrapped, $method), $args);
      } catch (Exception $e) {
          $this->log($e);
      }
  }
}SLL


An example

In some countries, we receive payments through mobile-originated messages (MO), a fancy word for saying SMS sent by the end user. So a simple way to monitor if we are receiving payment or if the server is exploded is to upload a metric counting them every time we receive one (pseudo-JSON format to show you the data):

{
  counter: 1
}

However, we can be more precise than this: an external outage or an integration problem may happen to a lower level than the whole application. For example, MOs can be delayed in Argentina, by a single carrier, while the rest of the world is still working fine.

So our data points look like this:

{
  counter: 1,
  tags: {
  country: "IT",
  carrier: "Vodafone",
  merchant: "Tasty Cookies, Inc.",
  }
}

and in turn graphs on DataDog or calls to its API can set up filters so that we can, if necessary, view only the data related to any combination of country, carrier and merchant.

The nice thing, SLL says, is that you just start send data from production and only after you have data points available you build a graph or an alert system basing on what appears to be the most important tags. For example, a big merchant may benefit from some dedicated monitoring, while minor countries such as Vietnam should be monitored as a whole since their traffic is by far lower than that of the others.

Data (computing) mobile app SaaS Metric (unit) UDP (Networking) Testing Object (computer science) Web application Data collection Merchant

Opinions expressed by DZone contributors are their own.

Related

  • Reliability Models and Metrics for Test Engineering
  • Data-Driven Decision-Making in Product Management: The Key to Success
  • Decoding the Confusion Matrix: A Comprehensive Guide to Classification Model Evaluation
  • Using A/B Testing To Make Data-Driven Product Decisions

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!