Over a million developers have joined DZone.

Golang Internals (Part 1): Main Concepts and Project Structure

· DevOps Zone

The DevOps zone is brought to you in partnership with Sonatype Nexus. The Nexus suite helps scale your DevOps delivery with continuous component intelligence integrated into development tools, including Eclipse, IntelliJ, Jenkins, Bamboo, SonarQube and more. Schedule a demo today

[This blog post is written by Sergey Matyukevich. Other parts of the series: Part 1 | Part 2 | Part 3]

For those who are familiar with the basics of Go and would like to get a deeper insight into its internals, we're starting a series of blog posts. Today’s article is dedicated to the structure of Go source code and some details on the Go compiler. After reading this, you should be able to answer the following questions:

  1. What is the structure of Go source code? 
  2. How does the Go compiler work? 
  3.  What is the basic structure of a node tree in Go?

Getting started

When you start learning a new programming language, you can usually find a lot of “hello-world” tutorials, beginner guides, and books with details on main language concepts, syntax, and even the standard library. However, getting information on such things as the layout of major data structures that the language runtime allocates or what assembly code is generated when you call built-in function is not that easy. Obviously, the answers lie inside the source code, but, from my own experience, you can spend hours wandering through it without making much progress.

I will not pretend to be an expert on the topic, nor will I attempt to describe every possible aspect. Instead, the goal is to demonstrate how you can decipher Go sources on your own.

Before we can begin, we certainly need our own copy of Go source files. There is nothing special in getting them. Just execute:

git clone https://github.com/golang/go

Please note that the code in the main branch is being constantly changed, so I use the release-branch.go1.4 branch in this blog post.

Understanding project structure

If you look at the /src folder of the Go repository, you can see a lot of folders. Most of them contain source files of the standard Go library. The standard naming conventions are always applied here, so each package is inside a folder with a name that directly corresponds to the package name. Apart from the standard library, there is a lot of other stuff. In my opinion, the most important and useful folders are:

Folder Description
/src/cmd/ Contains different command line tools.
/src/cmd/go/ Contains source files of a Go tool that downloads and builds Go source files and installs packages. While doing this, it collects all source files and makes calls to the Go linker and Go compiler command line tools.
/src/cmd/dist/ Contains a tool responsible for building all other command line tools and all the packages from the standard library. You may want to analyze its source code to understand what libraries are used in every particular tool or package.
/src/cmd/gc/ This is the architecture-independent part of the Go compiler.
/src/cmd/ld/ The architecture-independent part of the Go linker. Architecture-dependent parts are located in the folder with the “l” postfix that uses the same naming conventions as the compiler.
/src/cmd/5a/, 6a, 8a, and 9a Here you can find Go assembler compilers for different architectures. The Go assembler is a form of assembly language that does not map precisely to the assembler of the underlying machine. Instead, there is a distinct compiler for each architecture that translates the Go assembler to the machine’s assembler. You can find more details here.
/src/lib9/, /src/libbio, /src/liblink Different libraries that are used inside the compiler, linker, and runtime package.
/src/runtime/ The most important Go package that is indirectly included into all programs. It contains the entire runtime functionality, such as memory management, garbage collection, goroutines creation, etc.

Inside the Go compiler

As I said above, the architecture-independent part of the Go compiler is located in the  /src/cmd/gc/ folder. The entry point is located in the lex.c file. Apart from some common stuff, such as parsing command line arguments, the compiler does the following:

  1. Initializes some common data structures.

  2. Iterates through all of the provided Go files and calls the yyparse method for each file. This causes actual parsing to occur. The Go compiler uses Bison as the parser generator. The grammar for the language is fully described in the go.y file (I will provide more details on it later). As a result, this step generates a complete parse tree where each node represents an element of the compiled program.

  3. Recursively iterates through the generated tree several times and applies some modifications, e.g., defines type information for the nodes that should be implicitly typed, rewrites some language elements—such as typecasting—into calls to some functions in the runtime package and does some other work.

  4. Performs the actual compilation after the parse tree is complete. Nodes are translated into assembler code.

  5. Creates the object file that contains generated assembly code with some additional data structures, such as the symbols table, which is generated and written to the disk.

Diving into Go grammar

Now lets take a closer look at the second step. The go.y file that contains language grammar is a good starting point for investigating the Go compiler and the key to understanding the language syntax. The main part of this file consists of declarations, similar to the following:

     LFUNC fndcl fnbody

     sym '(' oarg_type_list_ocomma ')' fnres
| '(' oarg_type_list_ocomma ')' sym '(' oarg_type_list_ocomma ')' fnres

In this declaration, the xfndcl and fundcl nodes are defined. The fundcl node can be in one of two forms. The first form corresponds to the following language construct:

somefunction(x int, y int) int

and the second one to this language construct:

(t *SomeType) somefunction(x int, y int) int

The xfndcl node consists of the keyword func that is stored in the constant LFUNC, followed by the fndcl and fnbodynodes.

An important feature of Bison (or Yacc) grammar is that it allows for placing arbitrary C code next to each node definition. The code is executed every time a match for this node definition is found in the source code. Here, you can refer to the result node as $$ and to the child nodes as $1, $2, …

It is easier to understand this through an example. Note that the following code is a shortcut version of the actual code.

      sym '(' oarg_type_list_ocomma ')' fnres
          t = nod(OTFUNC, N, N);
          t->list = $3;
          t->rlist = $5;

          $ = nod(ODCLFUNC, N, N);
          $->nname = newname($1);
          $->nname->ntype = t;
          declare($->nname, PFUNC);
| '(' oarg_type_list_ocomma ')' sym '(' oarg_type_list_ocomma ')' fnres

First, a new node is created, which contains type information for the function declaration. The $3 argument list  and the $5 result list are referenced from this node. Then, the $$ result node is created. It stores the function name and the type node. As you can see, there can be no direct correspondence between definitions in the go.y file and the node structure.

Understanding nodes

Now it is time to take a look at what a node actually is. First of all, a node is a struct (you can find a definition here). This struct contains a large number of properties, since it needs to support different kinds of nodes and different nodes have different attributes. Below is a description of several fields that I think are important to understand.

Node struct field Description
op Node operation. Each node has this field. It distinguishes different kinds of nodes from each other. In our previous example, those were OTFUNC (operation type function) and ODCLFUNC (operation declaration function).
type This is a reference to another struct with type information for nodes that have type information (there are no types for some nodes, e.g., control flow statements, such as if, switch, or for).
val This field contains the actual values for nodes that represent literals.

Now that you understand the basic structure of the node tree, you can put your knowledge into practice. In the next post, we will investigate what exactly the Go compiler generates, using a simple Go application as an example.

Read all parts of the series: Part 1 | Part 2 | Part 3

About the author: Sergey Matyukevich is a Cloud Engineer and Go Developer at Altoros. With 6+ years in software engineering, he is an expert in cloud automation and designing architectures for complex cloud-based systems. An active member of the Go community, Sergey is a frequent contributor to open-source projects, such as Ubuntu and Juju Charms.

The DevOps zone is brought to you in partnership with Sonatype Nexus. Use the Nexus Suite to automate your software supply chain and ensure you're using the highest quality open source components at every step of the development lifecycle. Get Nexus today


Published at DZone with permission of Alex Khizhnyak. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}