Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Don't Bug Me: How to Debug and Fix Varnish

DZone's Guide to

Don't Bug Me: How to Debug and Fix Varnish

Your new Varnish Cache instance is doing great, but now you need to debug something that's popped up. Will it be a simple matter or did you just open up Pandora's Box?

· Performance Zone
Free Resource

Discover 50 of the latest mobile performance statistics with the Ultimate Guide to Digital Experience Monitoring, brought to you in partnership with Catchpoint.

Varnish Cache is quite easy to install and make it run, but is it easy to debug as well? The answer is, “meh.” It can go either way.

Yes, because the Varnish tools you have at your disposal (in particular Varnishlog and Varnishstat) will give you a lot of information regarding your Varnish instance, and with some patience, you’ll be able to understand what is going on. At the same time, no, because if the strange behavior or crash is triggered by a peculiar scenario, then it will be difficult to reproduce and likewise difficult to understand and fix.

With these conflicting thoughts on debugging in mind, I’d like to share some of the insights from my experience working in Varnish Software support.

There are four different types of possible problems, starting with the easiest to solve:

1. Functional

This has to do with when a part of your VCL is never reached or a wrong logic has been defined and you don’t understand what Varnish is actually doing and why it is not following your will.

This issue is quite easy to debug and fix. In most cases, simply testing your VCL before pushing it in production will do the trick. I suggest you have a look at the varnishtest tool.

2. Operational

In cases of low hit-rates or high response times, you are facing a slightly more complicated issue for debugging. Complications arise because it not only might be caused by your VCL, but your backends and/or Varnish itself might be involved, as well.

Varnishlog is usually the most suitable tool for such a scenario. It is very verbose and you should have no problem understanding which transaction does what. That is, if you want to understand why your hit-rate is so low (assuming you are not streaming; if you are streaming, then your hit-rate is expected to be not “brilliant”), you can either check whether your Varnish is seeing a lot of “passes” or a lot of “misses.” To do so, the query language available for the Varnishlog tool is the proper way to go.

For example, ‘varnishlog -q “VCL_call eq PASS”’ shows only “pass” transactions. If you have a lot of these, it will lead to a low hit-rate. Fallback to point one to solve this issue.

3. Capacity

Is your cache storage too small for your data set? Do you have the wrong number of threads?
Varnishstat has more than 300 counters that give you a very good indication of the health of your instance since startup time.

Important counters I always check whenever I’m on support include:

  • MAIN.cache_hit: Number of cache hits.

  • MAIN.cache_miss: Number of cache misses.

  • MAIN.n_lru_nuked: Number of LRU nuked objects from the cache. If this number is very big, then your cache storage is too small for your data set.

  • SMA.s0.xxx: SMA.s0 represent counters and gauges related to the actual use of your cache storage. Keeping those under control is suggested, so you’ll be able to check how the memory is used by Varnish and if you have allocated too much or too little storage for it.

And now, the most time-consuming type of error to debug...

4. Stability

Varnish is crashing and you have no idea why. Get yourself a coffee, relax, take your time, and get your hands dirty.

Things to check:

  • varnishadm panic.show: If Varnish has crashed this command, it will output the panic backlog:

"Assert error in CNT_Request(), cache/cache_req_fsm.c line 820:
  Condition(req->req_step == R_STP_LOOKUP || req->req_step == R_STP_RECV) not true.
errno = 2 (No such file or directory)
thread = (cache-worker)
version = varnish-4.1.3 revision 5e3b6d2
ident = Linux,3.14.32-xxxx-grs-ipv6-64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x434025: varnishd() [0x434025]
  0x438779: varnishd(CNT_Request+0x1bd9) [0x438779]
  0x44fc6f: varnishd(HTTP1_Session+0xef) [0x44fc6f]
  0x43a5ad: varnishd(SES_Proto_Req+0x5d) [0x43a5ad]
  0x44a522: varnishd() [0x44a522]
  0x44a98b: varnishd() [0x44a98b]
  0x74c77a6fb0a4: libpthread.so.0(+0x80a4) [0x74c77a6fb0a4]
  0x74c77a43087d: libc.so.6(clone+0x6d) [0x74c77a43087d]
 req = 0x74c6e4c06020 {
  vxid = 0, step = R_STP_RESTART,
  req_body = R_BODY_INIT,
  restarts = 0, esi_level = 0,
  sp = 0x74c715f8b220 {
    fd = 79, vxid = 26927013,
    client = X.X.X.X 27212,
    t_open = 1475250171.316174,
    t_idle = 1475250171.645251,
    step = S_STP_H1PROC,
  },”

If the panic is not present, then you’ll have to go through /var/log/ and look for anything Varnishrelated. I usually grep for “die and/or kill” within the output to check whether Varnish has indeed crashed or been killed.

Once you have found the panic or any other odd behavior, you can go through the open Varnish Cache issues and research whether it is a known open or solved issue.

If it is a solved issue, then all you need to do is upgrade to the latest Varnish Cache or the Varnish Cache Plus version. If it is not solved or still unknown, you can get in touch with Varnish Software support.

Is your APM strategy broken? This ebook explores the latest in Gartner research to help you learn how to close the end-user experience gap in APM, brought to you in partnership with Catchpoint.

Topics:
varnish ,cache ,performance ,debugging ,tutorial

Published at DZone with permission of Arianna Aondio, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}