Over a million developers have joined DZone.
Portal Partner

Lessons from Supporting Production Code

· DevOps Zone

The DevOps Zone is brought to you in partnership with Go Continuous Delivery. Learn the 5 key patterns to setting up a successful deployment pipeline, including designing parallel workflows, running tests in parallel, and more.

Until I started working on the uSwitch energy website around 8 months ago I had not really done any support of a production system so I learnt some interesting lessons in my time there.

Look at the new code first

We had our application wired up to Airbrake so whenever a user did anything which resulted in an exception being thrown we received a report with the stack trace, environment variables and which page they were on.

When trying to work out what had happened I initially started from scratch and tried to work backwards from the source and create a scenario in my head of what they might have done to get that error.

After a few times of doing this it became clear that there was a reasonable chance that if a user was experiencing a problem it was probably because of some new code that we’d just introduced.

We therefore tweaked our bug hunting algorithm to initially check code that had been changed recently and only after ruling that out did we work back from first principles.

It may never have worked

Sometimes it became clear that new code wasn’t to blame but it seemed implausible that the error could have actually happened.

There was a tendency to assume that the user must be deliberately doing something to make the application break but it soon became clear that they had just managed to hit a code path that had not been hit before.

Even if you’ve done extensive testing on a system users still seem to find paths through the code that haven’t failed previously so it seems best to just assume that is going to happen at some stage.

Log all the things

As I mentioned earlier we were using a 3rd party service to collect errors and other helpful information which was really useful for helping us find the root cause of problems.

The type of logging that you need varies so for a product like neo4j as well as logging exceptions we also log system information and memory settings.

Obviously I’m quite new to this type of work so I’m sure others will have useful bits of advice to share as well.

The DevOps Zone is brought to you in partnership with Go Continuous Delivery. Discover why "Hardly Anyone Knows Continuous Delivery."

Topics:

Published at DZone with permission of Mark Needham , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}