DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • Red-Black Trees in C#: A Guide to Efficient Self-Balancing Binary Search Trees
  • Understanding AVL Trees in C#: A Guide to Self-Balancing Binary Search Trees
  • Build an AI Chatroom With ChatGPT and ZK by Asking It How!
  • How To Build an NFT Minting dApp on Flow

Trending

  • Server-Driven UI: Agile Interfaces Without App Releases
  • Deploy Serverless Lambdas Confidently Using Canary
  • Cell-Based Architecture: Comprehensive Guide
  • Maximizing Return on Investment When Securing Our Supply Chains: Where to Focus Our Limited Time to Maximize Reward
  1. DZone
  2. Data Engineering
  3. Data
  4. Building a Language: Generating Bytecode

Building a Language: Generating Bytecode

Federico Tomassetti's series on building your own language reaches bytecode.

By 
Federico Tomassetti user avatar
Federico Tomassetti
·
Sep. 18, 16 · Tutorial
Likes (15)
Comment
Save
Tweet
Share
14.6K Views

Join the DZone community and get the full member experience.

Join For Free

in this post we are going to see how to generate bytecode for our language. so far we have seen how to build a language to express what we want, how to validate that language, and how to build an editor for that language, but yet we cannot actually run the code. time to fix that. by compiling for the jvm, our code will be able to run on all sorts of platforms. that sounds pretty great to me!

jvm_bytecode_write_your_own_compiler

series on building your own language

previous posts:

  1. building a lexer
  2. building a parser
  3. creating an editor with syntax highlighting
  4. build an editor with autocompletion
  5. mapping the parse tree to the abstract syntax tree
  6. model to model transformations
  7. validation

code is available on github under the tag 08_bytecode.

adding a print statement

before jumping into the bytecode generation let’s just add a print statement to our language. it is fairly easy: we just need to change a few lines in the lexer and parser definitions and we are good to go.

// changes to lexer
print              : 'print';

// changes to parser
statement : vardeclaration # vardeclarationstatement
          | assignment     # assignmentstatement
          | print          # printstatement ;

print : print lparen expression rparen ;

the general structure of our compiler

let’s start from the entry point for our compiler. we will either take the code from the standard input or from a file (to be specified as the first parameter). once we get the code, we try to build an ast and check for lexical and syntactical errors. if there are none we validate the ast and check for semantic errors. if still we have no errors we go on with the bytecode generation.

fun main(args: array<string>) {
    val code : inputstream? = when (args.size) {
        0 -> system.`in`
        1 -> fileinputstream(file(args[0]))
        else -> {
            system.err.println("pass 0 arguments or 1")
            system.exit(1)
            null
        }
    }
    val parsingresult = sandyparserfacade.parse(code!!)
    if (!parsingresult.iscorrect()) {
        println("errors:")
        parsingresult.errors.foreach { println(" * l${it.position.line}: ${it.message}") }
        return
    }
    val root = parsingresult.root!!
    println(root)
    val errors = root.validate()
    if (errors.isnotempty()) {
        println("errors:")
        errors.foreach { println(" * l${it.position.line}: ${it.message}") }
        return
    }
    val bytes = jvmcompiler().compile(root, "myclass")
    val fos = fileoutputstream("myclass.class")
    fos.write(bytes)
    fos.close()
}

note that in this example we are always producing a class file named myclass . probably later we would like to find a way to specify a name for the class file, but for now this is good enough.

using asm to generate bytecode

now, let’s dive into the funny part. the compile method of jvmcompiler is where we produce the bytes that later we will save into a class file. how do we produce those bytes? with some help from asm, which is a library to produce bytecode. now, we could generate the bytes array ourselves, but the point is that it would involve some boring tasks like generating the classpool structures. asm does that for us. we still need to have some understanding of how the jvm is structured but we can survive without being experts on the nitty-gritty details.

class jvmcompiler {

    fun compile(root: sandyfile, name: string) : bytearray {
        // this is how we tell asm that we want to start writing a new class. we ask it to calculate some values for us
        val cw = classwriter(classwriter.compute_frames or classwriter.compute_maxs)
        // here we specify that the class is in the format introduced with java 8 (so it would require a jre >= 8 to run)
        // we also specify the name of the class, the fact it extends object and it implements no interfaces
        cw.visit(v1_8, acc_public, name, null, "java/lang/object", null)
        // our class will have just one method: the main method. we have to specify its signature
        // this string just says that it takes an array of strings and return nothing (void)
        val mainmethodwriter = cw.visitmethod(acc_public or acc_static, "main", "([ljava/lang/string;)v", null, null)
        mainmethodwriter.visitcode()
        // labels are used by asm to mark points in the code
        val methodstart = label()
        val methodend = label()
        // with this call we indicate to what point in the method the label methodstart corresponds
        mainmethodwriter.visitlabel(methodstart)

        // variable declarations:
        // we find all variable declarations in our code and we assign to them an index value
        // our vars map will tell us which variable name corresponds to which index
        var nextvarindex = 0
        val vars = hashmap<string, var>()
        root.specificprocess(vardeclaration::class.java) {
            val index = nextvarindex++
            vars[it.varname] = var(it.type(vars), index)
            mainmethodwriter.visitlocalvariable(it.varname, it.type(vars).jvmdescription, null, methodstart, methodend, index)
        }

        // time to generate bytecode for all the statements
        root.statements.foreach { s ->
            when (s) {
                is vardeclaration -> {
                    // we calculate the type of the variable (more details later)
                    val type = vars[s.varname]!!.type
                    // the jvm is a stack based machine: it operated with values we have put on the stack
                    // so as first thing when we meet a variable declaration we put its value on the stack
                    s.value.pushas(mainmethodwriter, vars, type)
                    // now, depending on the type of the variable we use different operations to store the value
                    // we put on the stack into the variable. note that we refer to the variable using its index, not its name
                    when (type) {
                        inttype -> mainmethodwriter.visitvarinsn(istore, vars[s.varname]!!.index)
                        decimaltype -> mainmethodwriter.visitvarinsn(dstore, vars[s.varname]!!.index)
                        else -> throw unsupportedoperationexception(type.javaclass.canonicalname)
                    }
                }
                is print -> {
                    // this means that we access the field "out" of "java.lang.system" which is of type "java.io.printstream"
                    mainmethodwriter.visitfieldinsn(getstatic, "java/lang/system", "out", "ljava/io/printstream;")
                    // we push the value we want to print on the stack
                    s.value.push(mainmethodwriter, vars)
                    // we call the method println of system.out to print the value. it will take its parameter from the stack
                    // note that we have to tell the jvm which variant of println to call. to do that we describe the signature of the method,
                    // depending on the type of the value we want to print. if we want to print an int we will produce the signature "(i)v",
                    // we will produce "(d)v" for a double
                    mainmethodwriter.visitmethodinsn(invokevirtual, "java/io/printstream", "println", "(${s.value.type(vars).jvmdescription})v", false)
                }
                is assignment -> {
                    val type = vars[s.varname]!!.type
                    // this code is the same we have seen for variable declarations
                    s.value.pushas(mainmethodwriter, vars, type)
                    when (type) {
                        inttype -> mainmethodwriter.visitvarinsn(istore, vars[s.varname]!!.index)
                        decimaltype -> mainmethodwriter.visitvarinsn(dstore, vars[s.varname]!!.index)
                        else -> throw unsupportedoperationexception(type.javaclass.canonicalname)
                    }
                }
                else -> throw unsupportedoperationexception(s.javaclass.canonicalname)
            }
        }

        // we just says that here is the end of the method
        mainmethodwriter.visitlabel(methodend)
        // and we had the return instruction
        mainmethodwriter.visitinsn(return)
        mainmethodwriter.visitend()
        mainmethodwriter.visitmaxs(-1, -1)
        cw.visitend()
        return cw.tobytearray()
    }

}

about types

we have seen that our code uses types. this is needed because depending on the type we need to use different instructions. for example, to put a value in an integer variable we use istore, while to put a value in a double variable we use dstore . when we call system.out.println on an integer we need to specify the signature (i)v, while when we call it to print a double we specify (d)v .

to be able to do so we need to know the type of each expression. in our super simple language we use just int and double for now. in a real language we may want to use more types, but this will be enough to show you the principles.

interface sandytype {
    // given a type we want to get the corresponding string used in the jvm
    // for example: int -> i, double -> d, object -> ljava/lang/object; string -> [ljava.lang.string;
    val jvmdescription: string
}

object inttype : sandytype {
    override val jvmdescription: string
        get() = "i"
}

object decimaltype : sandytype {
    override val jvmdescription: string
        get() = "d"
}

fun expression.type(vars: map<string, var>) : sandytype {
    return when (this) {
        // an int literal has type int. easy :)
        is intlit -> inttype
        is declit -> decimaltype
        // the result of a binary expression depends on the type of the operands
        is binaryexpression -> {
            val lefttype = left.type(vars)
            val righttype = right.type(vars)
            if (lefttype != inttype && lefttype != decimaltype) {
                throw unsupportedoperationexception()
            }
            if (righttype != inttype && righttype != decimaltype) {
                throw unsupportedoperationexception()
            }
            // an operation on two integers produces integers
            if (lefttype == inttype && righttype == inttype) {
                return inttype
            // if at least a double is involved the result is a double
            } else {
                return decimaltype
            }
        }
        // when we refer to a variable the type is the type of the variable
        is varreference -> vars[this.varname]!!.type
        // when we cast to a value, the resulting value is that type :)
        is typeconversion -> this.targettype.tosandytype()
        else -> throw unsupportedoperationexception(this.javaclass.canonicalname)
    }
}

expressions

as we have seen, the jvm is a stack-based machine. so every time we want to use a value we push it on the stack and then do some operations. let’s see how we can push values into the stack

// convert, if needed
fun expression.pushas(methodwriter: methodvisitor, vars: map<string, var>, desiredtype: sandytype) {
    push(methodwriter, vars)
    val mytype = type(vars)
    if (mytype != desiredtype) {
        if (mytype == inttype && desiredtype == decimaltype) {
            methodwriter.visitinsn(i2d)
        } else if (mytype == decimaltype && desiredtype == inttype) {
            methodwriter.visitinsn(d2i)
        } else {
            throw unsupportedoperationexception("conversion from $mytype to $desiredtype")
        }
    }
}

fun expression.push(methodwriter: methodvisitor, vars: map<string, var>) {
    when (this) {
        // we have specific operations to push integers and double values
        is intlit -> methodwriter.visitldcinsn(integer.parseint(this.value))
        is declit -> methodwriter.visitldcinsn(java.lang.double.parsedouble(this.value))
        // to push a sum we first push the two operands and then invoke an operation which
        // depend on the type of the operands (do we sum integers or doubles?)
        is sumexpression -> {
            left.pushas(methodwriter, vars, this.type(vars))
            right.pushas(methodwriter, vars, this.type(vars))
            when (this.type(vars)) {
                inttype -> methodwriter.visitinsn(iadd)
                decimaltype -> methodwriter.visitinsn(dadd)
                else -> throw unsupportedoperationexception("summing ${this.type(vars)}")
            }
        }
        is subtractionexpression -> {
            left.pushas(methodwriter, vars, this.type(vars))
            right.pushas(methodwriter, vars, this.type(vars))
            when (this.type(vars)) {
                inttype -> methodwriter.visitinsn(isub)
                decimaltype -> methodwriter.visitinsn(dsub)
                else -> throw unsupportedoperationexception("summing ${this.type(vars)}")
            }
        }
        is divisionexpression -> {
            left.pushas(methodwriter, vars, this.type(vars))
            right.pushas(methodwriter, vars, this.type(vars))
            when (this.type(vars)) {
                inttype -> methodwriter.visitinsn(idiv)
                decimaltype -> methodwriter.visitinsn(ddiv)
                else -> throw unsupportedoperationexception("summing ${this.type(vars)}")
            }
        }
        is multiplicationexpression -> {
            left.pushas(methodwriter, vars, this.type(vars))
            right.pushas(methodwriter, vars, this.type(vars))
            when (this.type(vars)) {
                inttype -> methodwriter.visitinsn(imul)
                decimaltype -> methodwriter.visitinsn(dmul)
                else -> throw unsupportedoperationexception("summing ${this.type(vars)}")
            }
        }
        // to push a variable we just load the value from the symbol table
        is varreference -> {
            val type = vars[this.varname]!!.type
            when (type) {
                inttype -> methodwriter.visitvarinsn(iload, vars[this.varname]!!.index)
                decimaltype -> methodwriter.visitvarinsn(dload, vars[this.varname]!!.index)
                else -> throw unsupportedoperationexception(type.javaclass.canonicalname)
            }
        }
        // the pushas operation take care of conversions, as needed
        is typeconversion -> {
            this.value.pushas(methodwriter, vars, this.targettype.tosandytype())
        }
        else -> throw unsupportedoperationexception(this.javaclass.canonicalname)
    }
}

gradle

we can also create a gradle task to compile source files

task compilesandyfile(type:javaexec) {
    main = "me.tomassetti.sandy.compiling.jvmkt"
    args = "$sourcefile"
    classpath = sourcesets.main.runtimeclasspath
}

conclusions

we did not go into any detail and we sort of rush through the code. my goal here is just to give you an overview of the general strategy to use to generate bytecode. of course if you want to build a serious language you will need to do some studying and understand the internals of the jvm, there is no escape from that. i just hope that this brief introduction was enough to show you that this is not as scary or complicated as most people think.

Build (game engine) Abstract syntax Syntax highlighting Data Types Express Tree (data structure) Sort (Unix) push

Published at DZone with permission of Federico Tomassetti, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Red-Black Trees in C#: A Guide to Efficient Self-Balancing Binary Search Trees
  • Understanding AVL Trees in C#: A Guide to Self-Balancing Binary Search Trees
  • Build an AI Chatroom With ChatGPT and ZK by Asking It How!
  • How To Build an NFT Minting dApp on Flow

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: