Who Called Git? An Unusual Debugging Story
Who Called Git? An Unusual Debugging Story
I usually don't write articles about support ticket work, but this was a ticket worth blogging about. It was an unusual bug that I fixed in an unusual way.
Join the DZone community and get the full member experience.Join For Free
Container Monitoring and Management eBook: Read about the new realities of containerization.
I don’t usually talk about the support ticket work that I do. Most tickets are so specific it’s hard to write generalized articles. When I get a type of ticket that is worth blogging about, I usually find a better place to write about it, like devcenter docs, or I look for a way to push a fix to an upstream open-source library. Today, I got an unusual bug, and I fixed it in a fairly unusual (for me) way. Thought you might be interested.
The customer was complaining that this output was coming out when they would run commands.
$ heroku run rails console fatal: Not a git repository (or any parent up to mount point /app) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Originally, they thought that the issue may have been coming from our
heroku CLI since the issue wasn’t happening locally.
BTW, did you know you can open up a shell session to Heroku using
heroku run bash? I’ve been telling people this for about five years and it’s still surprising to them, very useful for debugging.
Anyway, after a little investigation, it turns out that they get that output any time you run any app commands on the server.
$ heroku run bash ~$ rake assets:precompile fatal: Not a git repository (or any parent up to mount point /app) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). # ...
The problem isn’t that traumatic, but it’s annoying. Also, it’s best practice to run with as few warnings as possible. On an unrelated support ticket: a customer of mine was getting a warning that told them exactly how to fix their problem, but they didn’t notice it because they were regularly getting five or six other warnings on deploy so when a new one was added they didn’t see it and didn’t take action.
The customer gave application access so I was able to reproduce the issue pretty easily. That error output is what you would get if you try to run a
git command when there isn’t a git repo.
$ cd /tmp $ git checkout foo fatal: Not a git repository (or any of the parent directories): .git
If you didn’t know where the message came from, a quick Google would point
gitas being the source of the error. But why was this only happening on Heroku?
One of the ways that we make scaling so fast is that we package your app in a zipped file and (roughly) all we have to do to get it to run is put it on a dyno and unpack it. One of the limiting factors is how fast we can transfer the zipped file (we call it a slug), so we recommend keeping the size of your repo low. To help with this we strip out the
.git directory of your app since it only adds extra disk space and makes scaling, restarting, or migrating to other dyno types slower. The error from git is telling us that we’re trying to run a git command with no git repository. This makes sense since we strip the
So, we know why the issue is happening (no git repo on the dyno plus we’re running a Git command). But we don’t know where the issue is happening, i.e., what is triggering that Git code.
My first instinct was to grep their app for
~ $ grep -R git app/* config/* lib/*
I got a few hits but they were mostly
di"git" strings, no smoking guns.
Expanding the search to all directories gave me way too many hits. 5210 references to the string
git to be exact. It would take too long to go through all of them.
We can get a bit more specific if we assume we are shelling out searching for
"\git"`. However, there are still 195 entries, which is still quite a bit.
While this specific problem isn’t that common, this general type of problem is. When one program calls another one, we usually want to know where in the first program that the second was being called. In a perfect world we would be able to raise an exception in the second program and get a backtrace from the first, however, this doesn’t always work. In our case the call to git was erroring and returning a non-zero exit code, however, the parent program was ignoring it.
The only option I had was to attempt to get more information. I figured maybe if I could find out what arguments are passed to
git I could better find the problem.
To do this I wrote an executable to
bin/git and since
bin/ is first on the path, the OS will pick it over the orignial
$ heroku run bash ~$ cat <<EOT > bin/git #!/usr/bin/env ruby raise ARGV.inspect EOT
Don’t be alarmed, we aren’t modifying your production code, a
heroku run instance spins up a new dyno that isn’t connected to your running app. You can do whatever you want to do to this instance’s disk and it won’t affect your production website.
You also have to make sure it’s executable:
~$ chmod +x bin/git
Now, we can see that our command is preferred:
~ $ which git /app/bin/git
Now, when we run any command against the app, we get a new error output:
/app/bin/git:4:in `<main>': ["rev-parse", "--short", "HEAD"] (RuntimeError)
This gives us more information. The output means that git is being called with
git rev-parse --short HEAD. A quick grep for this gives us one and only one match:
~ $ grep -R "git rev-parse --short" .../gems/raven-ruby/lib/raven/configuration.rb: self.release = `git rev-parse --short HEAD`.strip rescue nil
This was coming from the raven gem which is a sentry client. They are trying to determine the SHA of the latest “release” to report to the sentry server, but they weren’t checking to see if a
.git folder was present first.
So that’s where our error message was coming from. It turns out this was already fixed on a more recent version of raven where the check was added.
After the customer upgraded to a more recent version of
raven and the error went away!
Writing a custom executable to output debugging info isn’t a common technique and, likely, you’ll never use it. I wanted to write this because it was an interesting case, maybe there is a better way to find this info, but this one worked just fine.
We don’t always go on such deep dives for customer issues, but if we get a ton of similar error reports or find an issue that is intriguing enough, sometimes we can dig in, and sometimes it’s fun. One debugging technique I do use ALL the time that would have potentially solved this issue is to update gems. Most people run with really old dependencies. Usually, I recommend this when there is some related culprit. For example, if
rake assets:precompile is failing, I suggest upgrading any gem with “sprockets” or “asset” in the name. Often bugs are fixed by the larger community but you won’t take advantage of those fixes unless you upgrade to a version with that fix.
If you don’t have a smoking gun of where the problem is coming from, you can run
bundle outdated. This is an example of the command run against the rubygems.org repo:
$ bundle outdated Fetching gem metadata from https://rubygems.org/........ Fetching version metadata from https://rubygems.org/.. Fetching dependency metadata from https://rubygems.org/. Resolving dependencies.... Outdated gems included in the bundle: * psych (newest 2.1.0, installed 2.0.17, requested ~> 2.0.12) in group "default" * rack (newest 2.0.1, installed 1.6.4) in group "default" * rails (newest 5.0.0, installed 18.104.22.168, requested ~> 4.2.7) in group "default" * actionmailer (newest 5.0.0, installed 22.214.171.124) * actionpack (newest 5.0.0, installed 126.96.36.199) * actionview (newest 5.0.0, installed 188.8.131.52) * activejob (newest 5.0.0, installed 184.108.40.206) * activemodel (newest 5.0.0, installed 220.127.116.11) * activerecord (newest 5.0.0, installed 18.104.22.168) * activesupport (newest 5.0.0, installed 22.214.171.124) * arel (newest 7.0.0, installed 6.0.3) * rails-dom-testing (newest 2.0.1, installed 1.0.7) * railties (newest 5.0.0, installed 126.96.36.199)
Thanks to Nate for reminding me about this command at Rubyconf. Also, coincidentally, he happens to be the person who made the fix in the Sentry client.
Opinions expressed by DZone contributors are their own.