Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Beyond Beginning Git: Working Tree, Index and HEAD

DZone's Guide to

Beyond Beginning Git: Working Tree, Index and HEAD

Once you get past the first few commands, you learn Git by solving the next problem. This article describes a common problem that helps us learn a little about how Git works.

· Agile Zone
Free Resource

The single app analytics solutions to take your web and mobile apps to the next level.  Try today!

Every team using Git has that person who is an expert; the person that everyone asks when some strange thing goes wrong. That person usually got where they are one command at a time, starting with just a little more knowledge and then building on it by helping other people. At least that was my experience. I'm hoping to help people jumpstart this process with some articles that present common problems I've seen where the solution comes in handy lots of times.

This article is not intended for people who are brand new at Git. If that's you, there are lots of articles and books out there to help you, but I hope you'll consider reading through Conversational Git, my short open source ebook.

Adding with Dry Run

It's a common problem, especially with the first commit to a repository, to end up with files that shouldn't be staged. Of course, the.gitignore file is the way to prevent this; Git will respect patterns it's told to ignore when adding a directory tree. But sometimes the right patterns aren't in .gitignore yet.

The best thing, when adding a directory that might have undesired files is to use -n to dry run the add command first. For example, if we have a directory code with code.c and code.o, and we haven't committed yet, we might see this status:

$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    code/

    nothing added to commit but untracked files present (use "git add" to track)

If we run git add code but we forgot to exclude object files first, we will stage both source and object files:

$ git add code
$ git status
 On branch master

 Initial commit

 Changes to be committed:
   (use "git rm --cached <file>..." to unstage)

    new file:   code/code.c
    new file:   code/code.o

If we use -n or --dry-run we would find out the issue in advance:

$ git add -n code
add 'code/code.c'
add 'code/code.o'

Then we could fix it:

$ echo '*.o' > .gitignore
$ git add -n code
add 'code/code.c'

Unstaging Files

But what happens if we forget to check first? There is a fix, right there in the Git output:

(use "git rm --cached <file>..." to unstage)

So the mistake and the fix look like this:

$ git add code
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   code/code.c
    new file:   code/code.o

$ git rm --cached code/code.o
rm 'code/code.o'
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   code/code.c

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    code/code.o

That seems great, but don't get used to that command. Instead, keep reading.

HEAD, Index, and Working Tree

If you've read tutorials on "unstaging" files, they generally suggest using reset HEAD to unstage a file. So why does Git tell us to use rm --cached in this case? We don't want to delete the file, just avoid committing it. The answer is, we can't do reset HEAD because we don't have a HEAD yet.

First, a brief word about how Git keeps track of files that have been committed or are ready to commit. The area where we view and edit files, Git calls the "working tree". When we run git add, whether it's on a brand new file or just one we've changed, Git doesn't just mark the file for commit, it saves a copy from the working tree to an area it calls the "index". You can verify this by creating a file, adding it, editing it, then doing git status. The file will show up both as added and modified.

So when we've accidentally added a file we don't want committed, what we really want Git to do is remove it from the index. That's exactly what git rm --cached does.

"HEAD", meanwhile, is Git's name for the latest commit on the current branch (the "tip"). For our brand-new repository, there is no "HEAD" yet, so it really doesn't make any sense.

But usually we do want reset HEAD. In fact, if reset HEAD is the right answer, Git will even tell us so:

( After an initial commit )

$ git add .gitignore code
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   .gitignore
    modified:   code/code.c

In this case, reset HEAD will update the index to match "HEAD". (Updating the index, but not the working tree, is the default for reset.) That means the working tree still has the changes, but the index has no change since the latest commit, which is exactly what we want.

For the added file, we could choose to use git rm --cached or git reset HEAD, because either removes the file from the index. But for the modified file, we don't want to use rm --cached, because that will tell Git to stage the file for deletion! This is true because Git will now mark that file as removed in the index, when it exists in HEAD. So there's now a pending deletion if we do git commit.

This is a pretty confusing situation. The confusion comes in because the term "add" is overloaded for both adding new files and for staging modifications to existing files. Really, it would be best if we could use reset as the universal command for unstaging files. And with a modern version of Git, we can. If we use git reset in a case where there is no HEAD, it will just remove the file from the index, which is exactly what we want.

So our previous example becomes:

$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    code/

nothing added to commit but untracked files present (use "git add" to track)
$ git add code
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   code/code.c
    new file:   code/code.o

$ git reset code
$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    code/

So the only thing left is for the suggestion message to reflect the new capability of reset, so we don't have to be confused. But at least we picked up a little more detail about Git along the way.

CA App Experience Analytics, a whole new level of visibility. Learn more.

Topics:
git ,agile ,scm

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}