65 Resources for Creating Programming Languages
In this guide, we'll show you how to collect and organize all the knowledge you need to create your own programming language from scratch.
Join the DZone community and get the full member experience.Join For Free
Creating a programming language is one of the most fascinating challenges you can dream of as a developer.
The problem is that there are a lot of moving parts, a lot of things to do right and it is difficult to find a well-detailed map to show you the way. Sure, you can find a tutorial on writing half a parser, a half-baked list of advice on language design, an example of a naive interpreter, but to find those things, you will need to spend hours navigating forums and following links.
Here, we've collected relevant resources, evaluated them, and organized them, so you can spend time using good resources, not looking for them.
We organized the resources around the three stages in the creation of a programming language: design, parsing, and execution.
Designing the Language
When creating a programming language, you need to take ideas and transform them into decisions. This is what you do during the design phase.
Before You Start
Let's go over some good resources to beef up your knowledge on language design.
Designing the next programming language? Understand how people learn! This article presents a few considerations on how to design a programming language so that it’s easy to understand.
Five Questions About Language Design. Some good notes on programming language design by Paul Graham.
Design Concepts in Programming Languages. If you want to make deliberate choices in the creation of your programming language, this is the book you need. If you don’t already have the necessary theoretical background, you risk doing things the way everybody else does them. It’s also useful to develop a general framework to understand how the different programming languages behave and why.
Programming Language Pragmatics, 4th Edition. This is the most comprehensive book to read when one is trying to understand contemporary programming languages. It discusses different aspects of everything from C# to OCaml and even the different kinds of programming languages such as functional and logical ones. It also covers the several steps and parts of the implementation, such as an intermediate language, linking, virtual machines, etc.
Structure and Interpretation of Computer Programs, Second Edition. This is an introduction to computer science for people that already have a degree in it. A book widely praised by programmers, including Paul Graham, that helps you develop a new way of thinking about programming languages. It’s quite abstract and examples are proposed in Scheme. It also covers many different aspects of programming languages, including advanced topics like garbage collection.
Long discussions and infinite disputes are fought around type systems. Whatever choices you end up making it make sense to know the different positions.
These are two good introductory articles on the subject of type systems. The first article discusses the dichotomy Static/Dynamic type checking, and the second one dives into Introspection.
What To Know Before Debating Type Systems. If you already know the basics of type systems, this article will allow you to understand them better by going into definitions and details.
Type Systems (PDF). This paper on the formalization of Type Systems introduces more precise definitions of the different type systems.
Types and Programming Languages. A comprehensive book on understanding type systems. It will definitely have an impact on your ability to design programming languages and compilers. It has a strong theoretical basis, but it also explains the practical importance of individual concepts.
Functional Programming and Type Systems. An interesting university course on type systems for functional programming. It is used in a well-known French university. There is also notes and presentation material available. It is as advanced, as you would expect.
Type Systems for Programming Languages. This is a simpler course on Type Systems for (functional) programming languages.
Parsing transforms concrete syntax into a form that is more manageable for computers. This usually means transforming text written by humans into a more useful representation of the source code, an Abstract Syntax Tree.
There are usually two components in parsing: a lexical analyzer and the proper parser. Lexers, which are also known as tokenizers or scanners, transform the individual characters into tokens, the atom of meaning. Parsers instead organize the tokens in the proper Abstract Syntax Tree for the program. But since they are usually meant to work together you may use a single tool that does both tasks.
Using Flex as a lexer generator, and (Berkeley) Yacc or Bison to generate the proper parser, are the venerable choices to generate a complete parser. They are a few decades old and still maintained as open source software. Both are written in and created for C/C++ and still work, but have limitations in features and support for other languages.
Your own lexer and parser. If you need the best performance and you can create your own parser. You just need to have the necessary computer science knowledge.
Flex and Bison tutorial offers a good introduction to the two tools with bonus tips.
Lex and Yacc Tutorial - At 40 pages, this is the ideal starting point to learn how to put together lex and yacc in a few hours.
The best video tutorial on lex/Yacc comes in two parts (Part 1, Part 2). In an hour of video, you can learn the basics of using lex and yacc.
ANTLR Mega Tutorial is a renowned and beloved tutorial that explains everything you need to know about ANTLR, with bonus tips and tricks and extra resources to help you learn even more.
lex & yacc. Despite being a book written in 1992, it’s still the most recommended book on the subject.
flex & bison: Text Processing Tools. The best book on the subject written in this millennium.
The Definitive ANTLR 4 Reference. Written by the main author of the tool, this is really the definitive book on ANTLR 4. It explains all of its secrets and it’s also a good introduction to how the whole parsing thing works.
Parsing Techniques, 2nd edition. A comprehensive, advanced, and costly book to know more than you possibly need about parsing.
To implement your programming language, that is to say, to actually make something happen, you can build one of two things: a compiler or an interpreter. You could also build both of them if you wanted.
Here you can find a good overview if you need it: Compiled and Interpreted Languages. These resources are dedicated to explaining how compilers and/or interpreters are built, but for practical reasons, they also often explain the basics of creating lexers and parsers.
A compiler transforms the original code into something else, usually machine code, but it could also simply be any lower-level language, such as C. In the latter case, some people prefer to use the term transpiler.
LLVM - A collection of modular and reusable compiler and toolchain technologies used to create compilers.
CLR - The virtual machine part of the .NET technologies that permits you to execute different languages transformed into a common intermediate language.
JVM - The Java Virtual Machine that powers the Java execution.
Articles and Tutorials
The digital issue of MSDN Magazine for February 2008 (CHM format), contains an article on how to Create a Language Compiler for the .NET Framework. It’s a competent overview of the whole process.
A few series of tutorials from the LLVM Documentation. This is a great series of three tutorials on how to implement a language, called Kaleidoscope, with LLVM. The only problem is that some parts are not always up-to-date.
My First LLVM Compiler, a short and gentle introduction to the topic of building a compiler with LLVM.
Creating an LLVM Backend for the Cpu0 Architecture. A whopping 600-page tutorial to learn how to create an LLVM backend, also available in PDF or ePub. The content is great, but the English is lacking. On the positive side, if you are a student, they feel your pain of transforming theoretical knowledge into practical applications.
A Nanopass Framework for Compiler Education. A paper that presents a framework to teach the creation of a compiler in a simpler way, transforming the traditional monolithic approach into a long series of simple transformations. It’s an interesting read if you already have some theoretical background in computer science.
An Incremental Approach to Compiler Construction (PDF). A paper that is also a tutorial that develops a basic Scheme compiler with an easier-to-learn approach.
Compilers: Principles, Techniques, and Tools, 2nd Edition. This is the widely known Dragon book (because of the cover) in the 2nd edition (purple dragon). There is a paperback edition, which probably costs less but it has no dragon on it, so you cannot buy that. It is a theoretical book, so don’t expect the techniques to actually include a lot of reusable code.
Engineering a Compiler, 2nd edition. This is another compiler book with a theoretical approach, but it takes a more modern approach and it is more readable. It’s also more dedicated to the optimization of the compiler. So if you need a theoretical foundation and an engineering approach this is the best book to get.
An interpreter directly executes the language without transforming it into another form.
Articles and Tutorials
A simple interpreter from scratch in Python. A four-part series of articles on how to create an interpreter in Python; simple yet good.
Let’s Build A Simple Interpreter. A twelve-part series that explains how to create an interpreter for a subset of Pascal. The source code is in Python, but it has the necessary amount of theory to apply to another language. It also has a lot of funny images.
How to write an interpreter. This is a screencast, with source code available, on how to write an interpreter for a simple language with Python.
Writing An Interpreter In Go. Despite the title, it actually shows everything from parsing to creating an interpreter. It’s a contemporary book both in the sense that it's recent and it is a short one with a learn-by-doing attitude full of code and testing, without third-party libraries.
Crafting Interpreters. A work-in-progress and free book that already has good reviews. It is focused on making interpreters that work well, and in fact, you will build two of them during the course of the book. Its plan is to have just the right amount of theory to be able to fit in at a party of programming language creators.
General Programming Language Resources
These are resources that cover a wide range of the process of creating a programming language. They may be comprehensive or just give a general overview.
In this section, we include tools that cover the whole spectrum of building a programming language and that are usually used as standalone tools.
Xtext is a framework part of several related technologies to develop programming languages and especially Domain Specific Languages. It allows you to build everything from the parser to the editor to validation rules. You can use it to build great IDE support for your language. It simplifies the whole language-building process by reusing and linking existing technologies under the hood, such as the ANTLR parser generator.
JetBrains MPS is a projectional language workbench. Projectional means that the Abstract Syntax Tree is saved on disk and a projection is presented to the user. The projection could be text-like or be a table or diagram or anything else you can imagine. One side effect of this is that you will not need to do any parsing because it is not necessary. The term Language Workbench indicates that JetBrains MPS is a whole system of technologies created to help you create your own programming language: everything from the language itself to IDE and supporting tools designed for your language. You can use it to build every kind of language, but the possibility and need to create everything make it ideal for creating Domain Specific Languages that are used for specific purposes, by specific audiences.
Racket is described by its authors as “a general-purpose programming language as well as the world’s first ecosystem for developing and deploying new languages.” It’s a pedagogical tool developed with practical ambitions that even has a manifesto. It is a language made to create other languages that has everything: from libraries to developed GUI applications, to an IDE and the tools to develop logic languages. It’s part of the Lisp family of languages, and this tells everything you need to know: it’s all or nothing and always the Lisp-way.
Create a programming language for the JVM: Getting started. An overview of how and why to create a language for the JVM.
An answer to How to write a very basic compiler. A good answer to the question that gives an overview of the steps needed and the options available to perform the task of building a compiler.
Creating Languages in Racket. A great overview and presentation of Racket from the ACM Journal, with code.
A Tractable Scheme Implementation (PDF). A paper discussing a Scheme implementation that focuses on reliability and tractability. It builds an interpreter that will generate a sort of bytecode on the fly. This bytecode will then be immediately executed by a VM. The name derives from the fact that the original version was built in 48 hours. The full source code is available on the website of the project.
Create a useful language and all the supporting tools. A series of articles that start from scratch and teach you everything from parsing to building an editor with autocompletion, while building a compiler targeting the JVM.
There is a great deal of documentation for Racket that can help you to start using it, even if you don’t know any programming languages.
There is a good amount documentation for Xtext that can help you to start using it, including a couple of 15-minutes tutorials.
There is a great deal of documentation for JetBrains MPS, including specialized guides. There is a video channel with videos to help you use the software and an introduction to creating your first language in JetBrains MPS.
Make a language in one hour: stacker. This tutorial provides a tour of Racket and its workflow.
Create Your Own Programming Language. An article that shows a simple and hacky way of creating a programming language using JavaCC to create a parser and the Java reflection capabilities. It’s clearly not the proper way of doing it, but it presents all the steps and it’s easy to follow.
Writing Your Own Toy Compiler Using Flex, Bison, and LLVM. This article does what it's title says, using the proper tools (Flex, Bison, LLVM, etc.) but it’s slightly outdated since it’s from 2009. If you want to understand the general picture and how everything fits together this is still a good place to start.
Designing a Programming Language I. This is more than an article and less than a book. It has a good mix of theory and practice and it implements what it calls Duck Programming Language (inspired from Duck-Typing). A Part II, that explained how to create a compiler, was planned but never finished.
Writing a compiler in Ruby, bottom up. A 45-part series of articles on creating a compiler with Ruby. For some reason it starts bottom up, that is to say from the code generation to end up with the parser. This is the reverse of the traditional (and logical) way of doing things. It’s peculiar, but also very down-to-earth.
Implementing Programming Languages Using C# 4.0. The approach is a simple one and the libraries are quite outdated, but it’s a neat article to read to get a good introduction to how to build an interpreter in C#.
How to create your own virtual machine! (PDF). This tutorial explains how to create a virtual machine in C#. It’s surprisingly interesting, although not necessarily practical.
How to create pragmatic, lightweight languages. The focus here is on making a language that works in practice. It explains how to generate bytecode, target the LLVM, and build an editor for your language. Once you read the book you should know everything you need to make a usable, productive language. Incidentally, we have written this book.
How To Create Your Own Freaking Awesome Programming Language. It’s a 100-page PDF and a screencast that teaches you how to create a programming language using Ruby or the JVM. If you like the quick-and-dirty approach this book will get you started in little time.
Writing Compilers and Interpreters: A Software Engineering Approach, 3rd edition. It’s a pragmatic book that still teaches the proper approach to compilers/interpreters. Only that instead of an academic focus, it has an engineering one. This means that it’s full of Java code and there is also UML sprinkled here and there. Both the techniques and the code are slightly outdated, but this is still the best book if you are a software engineer and you need to actually do something that works correctly right now, that is to say, in a few months after the proper review process has completed.
Language Implementation Patterns. This is a book from the author of ANTLR, who is also a computer science professor. So it’s a book with a mix of theory and practice, that guides you from start to finish, from parsing to compilers and interpreters. As the name implies, it focuses on explaining the known working patterns that are used in building this kind of software, more than directly explaining all the theory followed by a practical application. It’s the book to get if you need something that really works right now. It’s even recommended by Guido van Rossum, the designer of Python.
Build Your Own Lisp. This a very peculiar book meant to teach you how to use the C language and how to build your own programming language, using a mini-Lisp as the main example. You can read it for free online or buy it. It’s meant you to teach about C, but you already have to be familiar with programming.
Beautiful Racket: How to make your own programming languages with Racket. It’s a good and continually updated online book on how to use Racket to build a programming language. The book is composed of a series of tutorials and parts of explanation and reference. It’s the kind of book that is technically free, but you should pay for it if you use it.
Programming Languages: Application and Interpretation. An interesting book that explains how to create a programming language from scratch using Racket. The author is a teacher, but of the good and understandable kind. In fact, there is also a series of recordings of the companion lectures, that sometimes have questionable audio.
Implementing Domain-Specific Languages with Xtext and Xtend, 2nd edition is a great book for people that want to learn with examples and using a test-driven approach. It covers all levels of designing a DSL, from the design of the type system to parsing and building a compiler.
Implementing Programming Languages is an introduction to building compilers and interpreters with the JVM as the main target. It has a good balance of theory and practice, but it’s explicitly meant as a textbook. So don’t expect much reusable code. It’s the typical textbook also in the sense that it can be a great and productive read if you already have the necessary background (or are a teacher), otherwise, you risk ending up confused.
Implementing functional languages: a tutorial. A free book that explains how to create a simple functional programming language from the parsing to the interpreter and compiler. On the other hand: “this book gives a practical approach to understanding implementations of non-strict functional languages using lazy graph reduction.” Also, expect a lot of math.
DSL Engineering. A great book that explains the theory and practice of building DSLs using language workbenches, such as MPS and Xtext. This means that other than traditional design aspects, such as parsing and interpreters, it covers things like how to create an IDE or how to test your DSL. It’s especially useful to software engineers because it also discusses software engineering and business-related aspects of DSLs. That is to say, it talks about why a company should build a DSL.
Lisp in Small Pieces. An interesting book that explains in detail how to design and implement a language of the Lisp family. It describes “11 interpreters and 2 compilers” and many advanced implementation details such as the optimization of the compiler. It’s obviously most useful to people interested in creating a Lisp-related language, but it can be an interesting read for everybody.
Published at DZone with permission of Gabriele Tomassetti, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Apache Kafka vs. Message Queue: Trade-Offs, Integration, Migration
Explainable AI: Making the Black Box Transparent
Build a Simple Chat Server With gRPC in .Net Core
Building A Log Analytics Solution 10 Times More Cost-Effective Than Elasticsearch