Over a million developers have joined DZone.

WTF Is a Thread?

DZone's Guide to

WTF Is a Thread?

Ever wonder what exactly a thread is? Not every developer has had the fortune to dive too deep into the world of concurrency. In this post, we take a look at one of the fundamental units of concurrency: the thread.

· Performance Zone ·
Free Resource

xMatters delivers integration-driven collaboration that relays data between systems, while engaging the right people to proactively resolve issues. Read the Monitoring in a Connected Enterprise whitepaper and learn about 3 tools for resolving incidents quickly.

What exactly is a thread? Many developers have been exposed to threads and processes over their careers without actually knowing how they work. Knowing more about our tools makes us better developers. To answer “WTF is a Thread?”, I took an operating systems course at Georgia Tech and then put my own spin on the information:

  • Video is 6 minutes and 22 seconds.

If you prefer text explanations, here’s the synopsis:

What is a thread? Before we can talk about a thread we have to understand a process. A process is an instance of an executing program, at its core, a process is essentially a blob of memory. It has code that needs to be executed, it has data such as variables, it has a register, and a stack to keep track of the order of execution. The code is going to be compiled into instructions that can run on our CPU.

Here’s an example of a program in C.

#include <stdio.h>

int main() {
  int i;
  for(i=0; i < 10; i+=1) {
      printf("hello world\n");

When we compile this, it’s going to turn into instructions that are gonna look like this:

$ gcc -o hello hello.c
$ objdump -d hello
100000f40:  55  pushq %rbp
100000f41:  48 89 e5  movq  %rsp, %rbp
100000f44:  48 83 ec 10   subq  $16, %rsp
100000f48:  c7 45 fc 00 00 00 00  movl  $0, -4(%rbp)
100000f4f:  c7 45 f8 00 00 00 00  movl  $0, -8(%rbp)
100000f56:  83 7d f8 0a   cmpl  $10, -8(%rbp)
100000f5a:  0f 8d 1f 00 00 00   jge 31 <_main+3F>
100000f60:  48 8d 3d 43 00 00 00  leaq  67(%rip), %rdi
100000f67:  b0 00   movb  $0, %al
100000f69:  e8 1a 00 00 00  callq 26
100000f6e:  89 45 f4  movl  %eax, -12(%rbp)
100000f71:  8b 45 f8  movl  -8(%rbp), %eax
100000f74:  83 c0 01  addl  $1, %eax
100000f77:  89 45 f8  movl  %eax, -8(%rbp)
100000f7a:  e9 d7 ff ff ff  jmp -41 <_main+16>
100000f7f:  8b 45 fc  movl  -4(%rbp), %eax
100000f82:  48 83 c4 10   addq  $16, %rsp
100000f86:  5d  popq  %rbp
100000f87:  c3  retq

Thanks to Julia Evans for some inspiration in the post What is the Stack?

As it runs, instructions are pulled off of memory and executed on the CPU. There are two really important parts of a process. There is a program counter. The program counter keeps track of where the process is currently executing. Imagine the program counter like an arrow pointing at an instruction set.

As the program runs, this location is kept by the hardware on a register on the CPU. When the program pauses or is preempted by another program, we have to store that counter somewhere.

There’s also a thing called the stack. This keeps track of the depth of our program. Here is another example program:

#include <stdio.h>

int increment(int x) {
  return x + 1;

int main() {
  int i = 0;
  printf("Incremented to: %i\n", increment(i)); # <==== HERE

When this executes, the main function is added to the stack. The increment function is called and it is added to the stack. The increment function will eventually return.

#include <stdio.h>

int increment(int x) {
  return x + 1; # <==== RETURN HERE TO MAIN

int main() {
  int i = 0;
  printf("Incremented to: %i\n", increment(i));

Whenever we return, where does it go? The program looks at the stack and sees where we were currently executing and uses that to determine where to resume execution. The last entry on the stack before increment was main so that is where it goes.

At this point, we’ve got roughly everything we need to start and stop a program, including the program counter and the stack pointer. We need a way to store all of this data, and so it goes into a struct in the operating system called the process control block or PCB.

The PCB includes process state, process number, program counter, registers, memory limits, list of open files, signal mask, and CPU scheduling information. In addition to some other things. Fun fact: since a process is entirely described by a PCB, whenever we wanna fork a process, essentially all we have to do is copy a parent process control block to a child process control block. Neato!

A process can execute two types of tasks. The first type of task is a CPU bound task. When the program is running and spending the majority of the time actually executing code. An example would be, processing data, or calculating prime numbers. The second type of a task a process can do is called an IO task (input/output task). An example of an IO task would be, reading a file from disk, or making a database call or making a network call, for example, to the Heroku API.

Whenever we’re doing an IO heavy task the process doesn’t do much. It makes the request and then it just sits there and waits for data to come back. This matters because we pay for our CPU time. We don’t want our program to not be doing anything. We want to maximize our CPU utilization. How exactly can we do that?

If we have one CPU heavy process and one CPU then we’re set.

The CPU is working at maximum speed. But what happens when we start to see IO?

In this example, our program is just sitting there and sleeping, our CPU is sitting there and not doing anything for a third of the time. And we’re no longer using our CPU efficiently. We can fix this by adding an additional process.

In this example, while one of our processes is sleeping, the other process is running. Aside from the context switch we are pretty much using our CPU 100% of the time. What are the pros of maximizing CPU with multiple processes?

Whenever we’re using multiple processes to utilize our CPU then, well, we get better CPU utilization. So that’s good.

What are the cons? Unfortunately, to do this, we need a lot of processes. Processes take up memory, which is a finite resource. Loading and unloading a PCB is expensive. Sharing data between processes is also hard: you can do things like map memory to different processes; though it’s really difficult and most people don’t really do it. We can also share data via sockets, but that’s very limiting and also fairly difficult.

Wouldn’t it be cool if there was some kind of really lightweight process that uses less memory, something with smaller time to context switch? Well, we have it, it’s called threads.

Previously, we looked at a process and everything we need to run a program. With a thread, we can share code and we can also share data. Really, with a thread, all we need is a new stack and a new register.

This an example of what a single threaded process would look like.

It’s a process and inside of it we have a thread, and that thread has a register and a stack. And if we want to have multiple threads on our process, then we don’t have to duplicate that code or the data, we just need new registers and new stacks for each of our threads.

All right, what are the pros of using multiple threads? Well, like before, we get much better CPU utilization. Unlike before, we can reuse existing process memory. Which means that we don’t have to duplicate code, and we don’t have to duplicate data. This means that we’re not gonna run out of memory, or at least not nearly as fast. It also means that we have a smaller context switch time between the different threads as opposed to having the context switch between different processes. We can also reuse cached values, such as memory lookups.

All right, now there has to be a downside to this, and the downside actually looks a lot like the upside. Shared code and shared data inside of our thread makes things faster, and it makes things lightweight. Unfortunately, it means multi-threaded code is much harder to work with.

With shared code and data, now we have to consider other threads will be accessing our data. We have to introduce new constructs like mutexes and condition variables. The core value-add of threads is having that shared code. That’s basically all a thread is - just a process with shared resources. Unfortunately, by doing that, we also add a lot of complexities.

Hopefully today you understand better what exactly a thread is: it’s just a process with shared code and data. Threads are not mythical. Threads are a tool in our toolbox. Unfortunately, since a thread is so simple it would be hard to make them easier to use without turning them into a process.

If you’re interested in this kind of “how do components of operating systems work” I recommend this operating system course on Udacity. It’s basically the same content as a Georgia Tech Online Masters degree, but minus the homework assignments, tests, TAs, and office hours. If you prefer books, I like the Operating System Concepts “dinosaur book” though it’s not as approachable as the Udacity course.

The other major form of dealing with IO overhead in a program, is with an “evented” model. Which is what JavaScript currently uses. Although there have been talks about adding threads to JS. Evented code is not a magic bullet, and has it’s own pros and cons. It is a tool like a process or a thread. While there is overlap in what each can accomplish, there are cases where each can be useful. As this post is about threads, I’m not going to dig into event driven concurrency.

Discovering, responding to, and resolving incidents is a complex endeavor. Read this narrative to learn how you can do it quickly and effectively by connecting AppDynamics, Moogsoft and xMatters to create a monitoring toolchain.

thread ,concurrency ,performance

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}