Byte Code Engineering

DZone 's Guide to

Byte Code Engineering

What's really happening inside those .classes? For most developers, this is still a mystery. But it doesn't have to be.

· Java Zone ·
Free Resource

This blog entry is the first of a multi-part series of articles discussing the merits of byte code engineering and its application. Byte code engineering encompasses the creation of new byte code in the form of classes and the modification of existing byte code. Byte code engineering has many applications. It is used in tools for compilers, class reloading, memory leak detection, and performance monitoring. Also, most application servers use byte code libraries to generate classes at run-time. Byte code engineering is used more often than you think. As a matter of fact, you can find popular byte code engineering libraries bundled in the JRE including BCEL and ASM. Despite its widespread usage, there appears to be very few university or college courses that teach byte code engineering. It is an aspect of programming that developers must learn on their own and for those who don’t, it remains a mysterious black art. The truth is, byte code engineering libraries make learning this field easy and are a gateway to a deeper understanding of JVM internals. The intent of these articles is to provide a starting point and then document some advanced concepts, which will hopefully inspire readers to develop their own skills.


There are a few resources that anyone learning byte code engineering should have handy at all times. The first is the Java Virtual Machine Specification (FYI this page has links to both the language and JVM specifications). Chapter 4, The Class File Format is indispensable. A second resource, which is useful for quick reference is the Wikipedia page entitled Java bytecode instruction listings. In terms of byte code instructions, it is more concise and informative that the JVM specification itself. Another resource to have handy for the beginner is a table of the internal descriptor format for field types. This table is taken directly from the JVM specification.

BaseType Character Type Interpretation
B byte signed byte
C char Unicode character code point in the Basic Multilingual
Plane, encoded with UTF-16
D double double-precision floating-point value
F float single-precision floating-point value
I int integer
J long long integer
L<ClassName>; reference an instance of class <ClassName>
S short signed short
Z boolean true or false
[ reference one array dimension

Most primitive field types simply use the field type's first initial to represent the type internally (i.e. I for int, F for float, etc), however, a long is J and a byte is Z. Object types are not intuitive. An object type begins with the letter L and ends with a semi-colon. Between these characters is the fully qualified class name, with each name separated by forward slashes. For instance, the internal descriptor for the field type java.lang.Integer isLjava/lang/Integer;. Lastly, array dimensions are indicated by the the '[' character. For each dimension, insert a '[' character. For instance a two-dimensional int array would be
[[I, whereas a two-dimensional java.lang.Integer array would be [[Ljava/lang/Integer;

Methods also have an internal descriptor format. The format is (<parameter types>)<return type>. All types use the above field type descriptor format above. A void return type is represented by the letter V. There is no separator for parameter types. Here are some examples:

  • A program entry point method of public static final void main(String args[]) would be ([Ljava/lang/String;)V
  • A constructor of the form public Info(int index, java.lang.Object types[], byte bytes[]) would be (I[Ljava/lang/Object;[Z)V
  • A method with signature int getCount() would be ()I

Speaking of constructors, I should also mention that all constructors have an internal method name of <init>. Also, all static initializers in source code are placed into a single static initializer method with internal method name <clinit>.


Before we discuss byte code engineering libraries, there is an essential learning tool bundled in the JDK bin directory called javap. Javap is a program which will disassemble byte code and provide a textual representation. Let's examine what it can do with the compiled version of the following code:

package ca.discotek.helloworld;

public class HelloWorld {

    static String message =
            "Hello World!";

    public static void main(String[] args) {
        try {
        catch (Exception e) {

Here is the output from the javap -help command:

Usage: javap  ...

where options include:
   -c                        Disassemble the code
   -classpath <pathlist>     Specify where to find user class files
   -extdirs <dirs>           Override location of installed extensions
   -help                     Print this usage message
   -J<flag>                  Pass  directly to the runtime system
   -l                        Print line number and local variable tables
   -public                   Show only public classes and members
   -protected                Show protected/public classes and members
   -package                  Show package/protected/public classes
                             and members (default)
   -private                  Show all classes and members
   -s                        Print internal type signatures
   -bootclasspath <pathlist> Override location of class files loaded
                             by the bootstrap class loader
   -verbose                  Print stack size, number of locals and args for methods
                             If verifying, print reasons for failure

Here is the output when we use javap to disassemble the HelloWorld program:

javap.exe -classpath "C:\projects\sandbox2\bin" -c -private -s -verbose ca.discotek.helloworld.HelloWorld
Compiled from "HelloWorld.java"
public class ca.discotek.helloworld.HelloWorld extends java.lang.Object
  SourceFile: "HelloWorld.java"
  minor version: 0
  major version: 50
  Constant pool:
const #1 = class        #2;     //  ca/discotek/helloworld/HelloWorld
const #2 = Asciz        ca/discotek/helloworld/HelloWorld;
const #3 = class        #4;     //  java/lang/Object
const #4 = Asciz        java/lang/Object;
const #5 = Asciz        message;
const #6 = Asciz        Ljava/lang/String;;
const #7 = Asciz        <clinit>;
const #8 = Asciz        ()V;
const #9 = Asciz        Code;
const #10 = String      #11;    //  Hello World!
const #11 = Asciz       Hello World!;
const #12 = Field       #1.#13; //  ca/discotek/helloworld/HelloWorld.message:Ljava/lang/String;
const #13 = NameAndType #5:#6;//  message:Ljava/lang/String;
const #14 = Asciz       LineNumberTable;
const #15 = Asciz       LocalVariableTable;
const #16 = Asciz       <init>;
const #17 = Method      #3.#18; //  java/lang/Object."<init>":()V
const #18 = NameAndType #16:#8;//  "<init>":()V
const #19 = Asciz       this;
const #20 = Asciz       Lca/discotek/helloworld/HelloWorld;;
const #21 = Asciz       main;
const #22 = Asciz       ([Ljava/lang/String;)V;
const #23 = Field       #24.#26;        //  java/lang/System.out:Ljava/io/PrintStream;
const #24 = class       #25;    //  java/lang/System
const #25 = Asciz       java/lang/System;
const #26 = NameAndType #27:#28;//  out:Ljava/io/PrintStream;
const #27 = Asciz       out;
const #28 = Asciz       Ljava/io/PrintStream;;
const #29 = Method      #30.#32;        //  java/io/PrintStream.println:(Ljava/lang/String;)V
const #30 = class       #31;    //  java/io/PrintStream
const #31 = Asciz       java/io/PrintStream;
const #32 = NameAndType #33:#34;//  println:(Ljava/lang/String;)V
const #33 = Asciz       println;
const #34 = Asciz       (Ljava/lang/String;)V;
const #35 = Method      #36.#38;        //  java/lang/Exception.printStackTrace:()V
const #36 = class       #37;    //  java/lang/Exception
const #37 = Asciz       java/lang/Exception;
const #38 = NameAndType #39:#8;//  printStackTrace:()V
const #39 = Asciz       printStackTrace;
const #40 = Asciz       args;
const #41 = Asciz       [Ljava/lang/String;;
const #42 = Asciz       e;
const #43 = Asciz       Ljava/lang/Exception;;
const #44 = Asciz       StackMapTable;
const #45 = Asciz       SourceFile;
const #46 = Asciz       HelloWorld.java;

static java.lang.String message;
  Signature: Ljava/lang/String;

static {};
  Signature: ()V
   Stack=1, Locals=0, Args_size=0
   0:   ldc     #10; //String Hello World!
   2:   putstatic       #12; //Field message:Ljava/lang/String;
   5:   return
   line 6: 0
   line 5: 2
   line 6: 5

public ca.discotek.helloworld.HelloWorld();
  Signature: ()V
   Stack=1, Locals=1, Args_size=1
   0:   aload_0
   1:   invokespecial   #17; //Method java/lang/Object."<init>":()V
   4:   return
   line 3: 0

   Start  Length  Slot  Name   Signature
   0      5      0    this       Lca/discotek/helloworld/HelloWorld;

public static void main(java.lang.String[]);
  Signature: ([Ljava/lang/String;)V
   Stack=2, Locals=2, Args_size=1
   0:   getstatic       #23; //Field java/lang/System.out:Ljava/io/PrintStream;
   3:   getstatic       #12; //Field message:Ljava/lang/String;
   6:   invokevirtual   #29; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   9:   goto    17
   12:  astore_1
   13:  aload_1
   14:  invokevirtual   #35; //Method java/lang/Exception.printStackTrace:()V
   17:  return
  Exception table:
   from   to  target type
     0     9    12   Class java/lang/Exception

   line 10: 0
   line 11: 9
   line 12: 12
   line 13: 13
   line 15: 17

   Start  Length  Slot  Name   Signature
   0      18      0    args       [Ljava/lang/String;
   13      4      1    e       Ljava/lang/Exception;

  StackMapTable: number_of_entries = 2
   frame_type = 76 /* same_locals_1_stack_item */
     stack = [ class java/lang/Exception ]
   frame_type = 4 /* same */


You should note that the -l flag to output line number information was purposely omitted. The -verbose flag outputs other relevant information including line numbers. If both are used the line number information will be printed twice.

Here is an overview of the output:

Line Numbers Description
2 Command line to invoke javap. See javap -help output above for explanation of parameters.
3 Source code file provided by debug information included in byte code.
4 Class signature
5 Source code file provided by debug information included in byte code.
6-7 Major and Minor versions. 50.0 indicates the class was compiled with Java 6.
8-54 The class constant pool.
57-58 Declaration of the message field.
60 Declaration of the static initializer method.
61 Internal method descriptor for method.
63 Stack=1 indicates 1 slot is required on the operand stack. Locals=0 indicates no local variables are required.
Args_size=0 is the number of arguments to the method.
64-66 The byte code instructions to assign the String value Hello World! to the message field.
67-77 If compiled with debug information, each method will have a LineNumberTable. The format of each entry is
<line number of source code>: <starting instruction offset in byte code>. You'll notice that the LineNumberTable
has duplicate entries and seamingly out of order (i.e. 6, 5, 6). It may not seem intuitive, but the compiler assembles the byte code
instructions will target the stack based JVM, which means it will often have to re-arrange instructions.
72 Default constructor signature
73 Default constructor internal method descriptor
75 Stack=1 indicates 1 slot is required on the operand stack. Locals=1 indicates there is one local variable. Method
parameters are treated as local variables. In this case, its the args parameter.
Args_size=1 is the number of arguments to the method.
76-78 Default constructor code. Simply invokes the default constructor of the super class, java.lang.Object.
79-80 Although the default constructor is not explicitly defined, the LineNumberTableindicates that the
default constructor is associated with line 3, where the class signature resides.
82-84 You might be surprised to see an entry in a LocalVariableTable because the default constructor
defines no local variables and has no parameters. However, all non-static methods will define the "this" local
variable, which is what is seen here. The start and length values indicate the scope of the local variable within the method.
The start value indicates the index in the method's byte code array where the scope begins and the length value
indicates the location in the array where the scope ends (i.e. start + length = end). In the constructor, "this"
starts at index 0. This corresponds to the a_load0 instruction at line 78. The length is 5, which covers the entire method as
the last instruction is at index 4. The slot value indicates the order in which it is defined in the method. The name
attribute is the variable name as defined in the source code. The Signature attribute represents the type of variable.
You should note that local variable table information is added for debugging purposes. Assigning identifiers to chunks of memory
is entirely to help humans understand programs better. This information can be excluded from byte code.
86 Main method declaration
87 Main method internal descriptor.
89 Stack=2 indicates 2 slots are required on the operand stack. Locals=2 indicates two local variables are required
(The args and exception e from the catch block). Args_size=1 is the number of arguments to the method (args).
90-97 Byte code associated with printing the message and catching any exceptions.
98-100 Byte code does not have try/catch constructs, but it does have exception handling, which is implemented in the Exception table.
Each row in the table is an exception handling instruction. The from and to values indicate the range of instructions to
which the exception handling applies. If the given type of instruction occurs between the from and to instructions
(inclusively), execution will skip to the target instruction index. The value 12 represents the start of the catch block.
You'll also notice the goto instruction after the invokevirtual instruction, which cause execution to skip to the end
of the method if no exception occurs.
102-107 Main method's line number table which matches source code with byte code instructions.
109-112 Main methods' LocalVariableTable, which defines the scope of the args parameter and the e exception variable.
114-117 The JVM uses StackMapTable entries to verify type safety for each code block defined within a method. This information
can be ignored for now. It is most likely that your compiler or byte code engineering library will generate this byte code
for you.

Byte Code Engineering Libraries

The most popular byte code engineering libraries are BCELSERPJavassist, and ASM. All of these libraries have their own merits, but overall, ASM is far superior for its speed and versatility. There are plenty of articles and blogs entries discussing these libraries in addition to the documentation on their web sites. Instead of duplicating these efforts, the following will provide links and hopefully other useful information.


The most obvious detractor for BCEL (Byte Code Engineering Library) has been its inconsistent support. If you look at the BCEL News and Status page, there have been releases in 2001, 2003, 2006, and 2011. Four releases spread over 10 years is not confidence inspiring. However, it should be noted that there appears to be a version 6 release candidate, which can be downloaded from GitHub, but not Apache. Additionally, the enhancements and bug fixes discussed in the download's RELEASE-NOTES.txt file are substantial, including support for the language features of Java 6, 7, and 8.

BCEL is a natural starting place for the uninitiated byte code developer because it has the prestige of the Apache Software Foundation. Often, it may serve the developer's purpose. One of BCEL's benefits is that it has an API for both the SAX and DOM approaches to parsing byte code. However, when byte code manipulation is more complex, BCEL will likely end in frustration due to its API documentation and community support. It should be noted that BCEL is bundled with a BCELifier utility which parses byte code and will output the BCEL API Java code to produce the parsed byte code. If you choose BCEL as your byte code engineering library, this utility will be invaluable (but note that ASM has an equivalent ASMifier).


SERP is a lesser known library. My experience with it is limited, but I did find it useful for building a Javadoc-style tool for byte code. SERP was the only API that could give me program counter information so I could hyperlink branching instructions to their targets. Although the SERP release documentation indicates there is support for Java 8's invokedynamic instruction, it is not clear to me that it receives continuous support from the author and there is very little community support. The author also discusses its limitations which include issues with speed, memory consumption, and thread safety.


Javassist is the only library that provides some functionality not supported by ASM... and its pretty awesome. Javassist allows you to insert Java source code into existing byte code. You can insert Java code before a method body or append it after the method body. You
can also wrap a method body in a try-block and add your own catch-block (of Java code). You can also subsitute an entire method body or other smaller constructs with your own Java source code. Lastly, you can add methods to a class which contain your own Java source code. This feature is extremely powerful as it allows a Java developer to manipulate byte code without requiring an in-depth understanding of the underlying byte code. However, this feature does have its limitations. For instance, if you introduce variables in an insertBefore() block of code, they cannot be referenced later in an insertAfter() block of code. Additionally, ASM is generally faster than Javassist, but the benefits in Javassist's simplicity may outweigh gains in ASM's performance. Javassists is continually supported by the authors at JBoss and receives much community support.


ASM has it all. It is well supported, it is fast, and it can do just about anything. ASM has both SAX and DOM style APIs for parsing byte code. ASM also has an ASMifier which can parse byte code and generate the corresponding Java source code, which when run will produce the parsed byte code. This is an invaluable tool. It is expected that the developer has some knowledge of byte code, but ASM can update frame information for you if you add local variables etc. It also has many utility classes for common tasks in its commons package. Further, common byte code transformations are documented in exceptional detail. You can also get help from the ASM mailing list. Lastly, forums like StackOverflow provide additional support. Almost certainly any problem you have has already been discussed in the ASM documentation or in a StackOverflow thread.

Useful Links


Admittedly, this blog entry has not been particularly instructional. The intention is to give the beginner a place to start. In my experience, the best way to learn is to have a project in mind to which you'll apply what you are learning. Documenting a few basic byte code engineering tasks will only duplicate other's efforts. I developed my byte code skills from an interest in reverse engineering. I would prefer not to document those skills as it would be counter-productive to my other efforts (I built a commerical byte code obfuscator called Modifly, which can perform obfuscation transformations at run-time). However, I am willing to share what I have learned by demonstrating how to apply byte code engineering to class reloading and memory leak detection (and perhaps other areas if there is interest).

Next Blog in the Series Teaser

Even if you don't use JRebel, you probably haven't escaped their ads. JRebel's home page claims "Reload Code Changes Instantly. Skip the build and redeploy process. JRebel reloads changes to Java classes, resources, and over 90 frameworks.". Have you ever wondered how they do it? I'll show you exactly how they do it with working code in my next blog in this series.

If you enjoyed this blog, you may wish to follow discotek.ca on twitter.

byte code ,byte code engineering ,java

Published at DZone with permission of Rob Kenworthy , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}