DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Popular
  3. Open Source
  4. Building + Testing Open Source Monitoring Tools

Building + Testing Open Source Monitoring Tools

Let's look at some of the processes and motivations that go into building open source tooling and one team's approach to testing their OSS — as well as thoughts on the wider OSS community.

Mercedes Coyle user avatar by
Mercedes Coyle
·
Sep. 17, 18 · Presentation
Like (1)
Save
Tweet
Share
2.74K Views

Join the DZone community and get the full member experience.

Join For Free

At Monitorama 2018, I shared some of the cool process and knowledge I’ve learned from developing a product for people other than myself to consume. After spending six years on call, I now build software that wakes people up in the night — AKA, infrastructure, and tooling for systems monitoring and performance analysis. As someone who’s been there, I’m conscientious about building quality software that people delight in using. Even if that software’s job is to wake them up, at the very least that can be for a good reason.

In this post, I’ll recap that talk, sharing the motivations behind building open source software, our approach to testing, and the critical role community plays in building better software.

Motivations for Building Open Source Software

First, a definition: Open source is code that is publicly available, with permission to use, modify, and share. I’d also extend that to include a license; anyone can put code out in public, but in order to be able to use it you need to be able to tell people how they can use it, contribute to it, and where to find it.

What motivates people to work on open source software? For many, it’s the fact that much of our daily work relies on it. Even the tools we pay for are built on open source technology. Personally, I’m also motivated by the desire to build and maintain tools that make people’s jobs easier. As an on-call veteran, I don’t want to needlessly wake someone up in the middle of the night because I didn’t write something properly. Finally, there is something very rewarding about seeing your code run in an infrastructure you don’t control — to see that others are using (and benefitting from) the tools you created.

Testing OSS: Making Sure Code Is Performant

Building open source software is all well and good, but it’s also important to make sure it’s working as it should. We released (and open sourced) Sensu 2.0 as an Alpha earlier this year; we rewrote our product in Go on top of etcd. While we still use a pub/sub model, it’s now embedded on top of etcd. You run just one binary, and it’s faster to set up and start using, as compared to v1. Our Alpha was far from polished or perfect, but we wanted to get it in the hands of our users to hear about pain points, what people were missing, and what they liked.

Once we had an Alpha — and some information from our users — we started planning what QA testing would look like. Since we’re not the primary consumers of our software — and we don’t run it on internal infrastructure — it’s tricky knowing whether our software is working. Our tests during development were limited to unit, integration, and end-to-end testing, which have their own challenges and tend to be brittle in some instances (more on this later). These were useful for determining whether code was working during development, but — because we weren’t testing in real-world environments — they weren’t useful in determining feature need, usability, or system behavior. We weren’t testing in a real-world environment. Usability testing is also tough when you’re not a core user of your product, especially with such a malleable framework. With so much flexibility in our system, we weren’t going to be able to test every possibility.

When testing software, we want to keep the following questions in mind:

  • Does the software behave as we expect?
  • Does it solve a problem or need?
  • Can we find the bugs before users do?

In order to answer those questions, we needed to do different types of testing to benchmark how our system was performing:

  • Code quality. We used Code Climate’s Velocity to track PR size, time to merge, and complexity risk, as well as alert us on the size of PRs. Our PR goal was 200 lines of code, which we achieved about 50% of the time. But, compare that to this time last year and it’s a completely different story.

Image title

Source: Velocity by Code Climate

Although this is a neat metric, it’s not end all be all; it can’t determine whether your codebase is keeping up with consistent quality because features and bug cycles are changing over time.

  • Code analysis, or our build success ratio.

Image title

As you can see, we were wrestling with getting a stable CI test suite. And, if we’re having trouble with this, our users will too, since we’re not getting stuff out in a timely fashion. We discovered that our end-to-end testing strategy was brittle, and we had to rework some of our build tests to be more reliable. Again, more on this later.

  • User testing, or what I like to call mob QA. You can’t possibly cover every possibility in software testing; you need actual usage in order to uncover bugs.

Image title

We got together in a Zoom call, figured out what features we’re trying to test, and determined acceptance criteria. Everyone picked one or two features to test they hadn’t worked on and we set about breaking things left and right. We uncovered some non-major software bugs and — more importantly — deepened our understanding of what the code base and features looked like.

  • Load testing. In order to be confident in recommending usage and developing new features, we needed some data points for the types of load our customers would be experiencing.

Image title


Our plan was to set up a single VM running sensu-backend in Google Cloud Platform and run a Kubernetes cluster with five agents per pod. We assumed that GKE would autoscale until we had the number of agents we needed (10,000), but in all actuality, we ended up load testing GKE! Because we were launching so many pods at once, the API throttled us, so we had to rethink our load-testing strategy. We took the simpler approach of writing a script in Go to spin up 10,000 agents and connected those to a single backend.

These tests helped uncover several interesting bugs, including performance issues (involving an etcd autocompaction bug), a weird UTC timing snafu, and a check scheduling failure. I cover these bugs — and how we fixed them — in detail in my talk.

I’d mentioned earlier that some of our testing revealed gaps (or bugs) in our understanding; I can’t emphasize enough the importance of documentation. I love this quote from Jordan Sissel: “If a new user has a bad time, it’s a bug.” I’ve had many bad times with docs that seemed like bugs but were actually just bad documentation. We wanted to avoid this with Sensu 2.0, and decided to polish our docs for the Beta release. We even created a doc for how to write docs, with recommendations on what a proper guide should include:

  • Why you’d want to use this feature
  • What it is
  • A quick-and-dirty way to get set up
  • A reference guide to the feature’s API

A documented guide (for documentation) makes it easier to write consistent stories around a particular feature for how you’d do something. Now, our docs are organized, searchable, and open source — users can contribute when they see issues.

The Importance of Community (and How We Learn From It)

Now that Sensu 2.0 is in the wild, we rely on the Sensu Community to use it and give feedback. We engage with our Community in a number of ways, including our Community Slack, accelerated feedback program, and Community Chats.

We’re also exploring experience reports, which we learned about from the Go community, which include questions like, “How did you expect this feature to work, how did it actually work, and what are you doing with it in your particular environment?”

Our Community continues to inform our decisions and deepen our understanding of how our product is used. Building open source software is ultimately very rewarding — both personally and professionally — and comes with the added benefit of having a large community of users who can help make it better.

Open source Software Open-source software IT

Published at DZone with permission of Mercedes Coyle, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Master Spring Boot 3 With GraalVM Native Image
  • 10 Most Popular Frameworks for Building RESTful APIs
  • Isolating Noisy Neighbors in Distributed Systems: The Power of Shuffle-Sharding
  • Data Stream Using Apache Kafka and Camel Application

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: