DZone
Performance Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Performance Zone > The Randomly Failing Test

The Randomly Failing Test

The format of the test was flawless, but once in a blue moon, we'd have a non-reproducible test failure. Why? The reason was pretty obvious once we looked at it.

Oren Eini user avatar by
Oren Eini
·
Apr. 30, 17 · Performance Zone · Opinion
Like (3)
Save
Tweet
2.76K Views

Join the DZone community and get the full member experience.

Join For Free

We made low-level change in how RavenDB is writing to the journal. This was verified by multiple code reviews and a whole battery of tests and production abuse. And yet, once in a blue moon, we’ll have a test failure. It's utterly nonreproducible, and it's only happening once every week or two (out of hundreds or thousands of test runs). That was worrying because this test was checking the behavior of RavenDB when it crashed midway through a transaction, which is kind of an important metric for us.

It took a long while to finally figure out what was going on there. The first thing that we ruled out was non-reproducibility because of threading. This test was single-threaded, and nothing could inject anything to the code.

The format of the test was something like this:

  • Write 1,000 random fixed-size values to the database.
  • Close the database.
  • Corrupt the last page of the page journal.
  • Start the database again and note that all the values in the last transaction are not in the database.

So far, awesome. So why would it fail?

The underlying reason was an obvious one, once we looked at it. The only thing that differs from test to test is the random call. But we are using fixed size buffers to write, so that shouldn’t change anything. The data itself is meaningless.

As it turned out, the data was not quite meaningless. As part of the commit process, we compress the data before we write it to the journal. As it turns out, different patterns of random buffers have different compression characteristics. In other words, a buffer of 100 random bytes may compress to 90 bytes or 102 bytes. And that mattered. If the test got enough random inputs to create a new journal file, we will still corrupt the last page on that journal, but since we already are on a new journal, that last page hasn’t been used yet, and the transaction wouldn’t become corrupt and we would have the data still in the database, effectively failing the test.

Testing

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Comparing Distributed Databases
  • Troubleshooting HTTP 502 Bad Gateway in AWS EBS
  • Getting Started Building on the NEAR Network with Infura
  • What Is Edge Compute? It’s Kind of Like Knitting Dog Hats

Comments

Performance Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo