Here it is - a new guide, to collect and organize all the knowledge that you need to create your own programming language from scratch.
Creating a programming language is one of the most fascinating challenges you can dream of as a developer.
The problem is that there are a lot of moving parts, a lot of things to do right and it is difficult to find a well-detailed map to show you the way. Sure, you can find a tutorial on writing half a parser, a half-baked list of advice on language design, an example of a naive interpreter. To find those things you will need to spend hours navigating forums and following links.
We thought it would be interesting to collect relevant resources, evaluate them and organize them, so you can spend time using good resources, not looking for them.
We organized the resources around the three stages in the creation of a programming language: design, parsing, and execution. In this article, we'll go over the resources you need to get an understanding of designing a language, and parsing.
Designing the Language
When creating a programming language you need to take ideas and transform them into decisions. This is what you do during the design phase.
Before You Start
Let's go over some good resources to beef up your knowledge on language design.
- Designing the next programming language? Understand how people learn! This article presents a few considerations on how to design a programming language so that it’s easy to understand.
- Five Questions About Language Design. Some good (and some random) notes on programming language design by Paul Graham.
- Design Concepts in Programming Languages. If you want to make deliberate choices in the creation of your programming language, this is the book you need. Otherwise, if you don’t already have the necessary theoretical background, you risk doing things the way everybody else does them. It’s also useful to develop a general framework to understand how the different programming languages behave and why.
- Practical Foundations for Programming Languages. This is, for the most part, a book about studying and classifying programming languages. But by understanding the different options available it can also be used to guide the implementation of your programming language.
- Programming Language Pragmatics, 4th Edition. This is the most comprehensive book to read when one is trying to understand contemporary programming languages. It discusses different aspects of everything from C# to OCaml and even the different kinds of programming languages such as functional and logical ones. It also covers the several steps and parts of the implementation, such as an intermediate language, linking, virtual machines, etc.
- Structure and Interpretation of Computer Programs, Second Edition. This an introduction to computer science for people that already have a degree in it. A book widely praised by programmers, including Paul Graham (directly on the book's Amazon Page), that helps you develop a new way of thinking about programming languages. It’s quite abstract and examples are proposed in Scheme. It also covers many different aspects of programming languages, including advanced topics like garbage collection.
Long discussions and infinite disputes are fought around type systems. Whatever choices you end up making it make sense to know the different positions.
- These are two good introductory articles on the subject of type systems. The first article discusses the dichotomy Static/Dynamic type checking, and the second one dives into Introspection.
- What To Know Before Debating Type Systems. If you already know the basics of type systems, then this article is for you. It will allow you to understand them better by going into definitions and details.
- Type Systems (PDF). This a paper on the formalization of Type Systems that also introduces more precise definitions of the different type systems.
- Types and Programming Languages. A comprehensive book on understanding type systems. It will definitely have an impact on your ability to design programming languages and compilers. It has a strong theoretical basis, but it also explains the practical importance of individual concepts.
- Functional Programming and Type Systems. An interesting university course on type systems for functional programming. It is used in a well-known French university. There is also notes and presentation material available. It is as advanced, as you would expect.
- Type Systems for Programming Languages. This is a simpler course on Type Systems for (functional) programming languages.
Parsing transforms concrete syntax into a form that is more manageable for computers. This usually means transforming text written by humans into a more useful representation of the source code, an Abstract Syntax Tree.
There are usually two components in parsing: a lexical analyzer and the proper parser. Lexers, which are also known as tokenizers or scanners, transform the individual characters into tokens, the atom of meaning. Parsers instead organize the tokens in the proper Abstract Syntax Tree for the program. But since they are usually meant to work together you may use a single tool that does both tasks.
- Using Flex as a lexer generator, and (Berkeley) Yacc or Bison to generate the proper parser, are the venerable choices to generate a complete parser. They are a few decades old and they are still maintained as open source software. They are written in and created for C/C++. They still work, but they have limitations in features and support for other languages.
- Your own lexer and parser. If you need the best performance and you can create your own parser. You just need to have the necessary computer science knowledge.
- Flex and Bison tutorial offers a good introduction to the two tools with bonus tips.
- Lex and Yacc Tutorial. At 40 pages, this is the ideal starting point to learn how to put together lex and yacc in a few hours.
- The best video tutorial that I've found on lex/Yacc comes in two parts (Part 1, Part 2). In an hour of video, you can learn the basics of using lex and yacc.
- ANTLR Mega Tutorial is a renowned and beloved tutorial that explains everything you need to know about ANTLR, with bonus tips and tricks and extra resources to help you learn even more.
- lex & yacc. Despite being a book written in 1992 it’s still the most recommended book on the subject. Some people say that's because of the lack of competition, others because it is just that good.
- flex & bison: Text Processing Tools. The best book on the subject written in this millennium.
- The Definitive ANTLR 4 Reference. Written by the main author of the tool, this is really the definitive book on ANTLR 4. It explains all of its secrets and it’s also a good introduction to how the whole parsing thing works.
- Parsing Techniques, 2nd edition. A comprehensive, advanced, and costly book to know more than you possibly need about parsing.