DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Databases
  4. Abstraction slows you down

Abstraction slows you down

Wojciech Kudla user avatar by
Wojciech Kudla
·
Oct. 04, 11 · Interview
Like (0)
Save
Tweet
Share
5.61K Views

Join the DZone community and get the full member experience.

Join For Free

Update

As there have been several comments about my microbenchmark, I feel in a position to give a common reply.

Thank you for pointing out flaws in my test implementation. This time I used snippet from Maciej Gorączka to illustrate the pitfalls of optimization and at the same time to show real impact of interface calls. Of course it had to be modified as it also suffered from the optimization pitfalls.

There are several things in the area of method call optimizations one have to keep in mind when measuring performance. Some of them have have been described in performance techniques chapter of HotSpot internals.
With my original benchmark I did not take into account the below.

Virtual (and interface) invocations with a lopsided type profile are compiled with an optimistic check in favor of the historically common type (or two types).

So there's a lot of truth in saying microbenchmarks can be reeealy tricky.

As an example of how forgeting about optimizer may mess up your tests, the below code:

import java.math.BigDecimal;
import java.util.Random;

public class OptimizationTest
{
    private static final long ITERATIONS = 1000 * 1000 * 1000;
    private static final int RUNS = 1;
    private static final SomeInterface[] IFACES = new SomeInterface[] {new Impl1(), new Impl2()};
    private static final Impl1[] IMPLS = new Impl1[] {new Impl1(), new Impl1()};
    private static final Random RANDOM = new Random();
    private static final int CLASS_COUNT = 2;

    public static void main(String[] args)
    {
        performTestIface(2); //warm up

        final long implTime = performTestImpl(RUNS);
        final BigDecimal runs = new BigDecimal(RUNS);
        final BigDecimal implAvg = new BigDecimal(implTime).divide(runs);
        System.out.println("Invokevirtual: " + implAvg + " ms");

        final long ifaceTime = performTestIface(RUNS);
        final BigDecimal ifaceAvg = new BigDecimal(ifaceTime).divide(runs);
        System.out.println("Invokeinterface: " + ifaceAvg + " ms");
    }

    private static long performTestIface(final long runs)
    {
        final SomeInterface iface = IFACES[RANDOM.nextInt(CLASS_COUNT)];
        long ifaceTime = 0;

        for (int run = 0; run < runs; run++)
        {
            System.gc();
            long start = System.currentTimeMillis();
            for (int i = 0; i < ITERATIONS; i++)
            {
                iface.doSomething(i);
            }
            ifaceTime += System.currentTimeMillis() - start;
        }
        return ifaceTime;
    }

    private static long performTestImpl(final long runs)
    {
        final Impl1 impl = IMPLS[RANDOM.nextInt(CLASS_COUNT)];
        long implTime = 0;

        for (int run = 0; run < runs; run++)
        {
            System.gc();
            long start = System.currentTimeMillis();
            for (int i = 0; i < ITERATIONS; i++)
            {
                impl.doSomething(i);
            }
            implTime += System.currentTimeMillis() - start;
        }
        return implTime;
    }


    static interface SomeInterface
    {
        void doSomething(int i);
    }

    static class Impl1 implements SomeInterface
    {
        private int i;

        public void doSomething(final int i)
        {
            this.i = i;
        }
    }

    static class Impl2 implements SomeInterface
    {
        private int i;

        public void doSomething(final int i)
        {
            this.i = i;
        }
    }


}

Please make sure to run it 10+ times. I hope you'll have fun with the results.

Now, the explanation.
When run with -XX:+PrintCompilation flag, the test either outputs:

    142   9       test.InvokeInterfaceTest2$Impl1::doSomething (6 bytes)
    142   1%      test.InvokeInterfaceTest2::performTestIface @ 36 (77 bytes)
   1701   2%      test.InvokeInterfaceTest2::performTestImpl @ 36 (75 bytes)
Invokevirtual: 745 ms
   2447  10       test.InvokeInterfaceTest2::performTestIface (77 bytes)
Invokeinterface: 722 ms

 or:

      65   3       test.InvokeInterfaceTest2$Impl2::doSomething (6 bytes)
     66   1%      test.InvokeInterfaceTest2::performTestIface @ 36 (77 bytes)
   1523   4       test.InvokeInterfaceTest2$Impl1::doSomething (6 bytes)
   1523   2%      test.InvokeInterfaceTest2::performTestImpl @ 36 (75 bytes)
Invokevirtual: 732 ms
   2255   5       test.InvokeInterfaceTest2::performTestIface (77 bytes)
   2262   1%     made not entrant  test.InvokeInterfaceTest2::performTestIface @ -2 (77 bytes)
   2268   3%      test.InvokeInterfaceTest2::performTestIface @ 36 (77 bytes)
Invokeinterface: 1816 ms

Note that HotSpot complains about method made not entrant, which means that previously compiled method is not valid anymore. This happens because we are using 2 different implementations of the same interface like in the faster example, but this time class implementing the interface changed, so target method had to be deoptimized.

It becomes even more clear when we look at how the optimization have been implemented by the compiler. In the fast run we get native code for performTestIface:

126   B16: #    N1 <- B2  Freq: 1.01326e-06
126       movl    RSI, #-28    # int
12b       movq    RBP, [rsp + #0]    # spill
12f       movl    [rsp + #4], RAX    # spill
133       call,static  wrapper for: uncommon_trap(reason='range_check' action='make_not_entrant')
        # OptimizationTest::performTestIface @ bci:10  L[0]=RBP L[1]=_ L[2]=_ L[3]=_ L[4]=_ L[5]=_ L[6]=_ L[7]=_ L[8]=_ STK[0]=rsp + #8 STK[1]=rsp + #4
        # OopMap{[8]=NarrowOop off=312}
138       int3    # ShouldNotReachHere

The code gets recompiled after some number of subsequent monomorphic calls reach PerMethodTrapLimit or PerBytecodeRecompilationCutoff. So for the slower run the code gets recompiled to:

14f   B16: #    N250 <- B9  Freq: 0.25578
14f       movl    RSI, #-58    # int
154       movq    [rsp + #16], RBX    # spill
159       movq    [rsp + #32], R14    # spill
15e       movq    [rsp + #40], R13    # spill
163       call,static  wrapper for: uncommon_trap(reason='bimorphic' action='maybe_recompile')
        # OptimizationTest::performTestIface @ bci:49  L[0]=rsp + #0 L[1]=_ L[2]=rsp + #40 L[3]=rsp + #16 L[4]=_ L[5]=rsp + #24 L[6]=rsp + #32 L[7]=_ L[8]=RBP STK[0]=rsp + #40 STK[1]=RBP
        # OopMap{[40]=Oop off=360}
168       int3    # ShouldNotReachHere
168
16d   B17: #    B3 B18 <- B2  Freq: 0.16983
16d       decode_heap_oop_not_null RSI,R10
171       movq    rdi, [RSI + (sizeof(oopDesc) + Klass::secondary_supers_offset_in_bytes())]
    movl    rcx, [rdi + arrayOopDesc::length_offset_in_bytes()]    # length to scan
    addq    rdi, arrayOopDex::base_offset_in_bytes(T_OBJECT)    # Skip to start of data; set NZ in case count is zero
    repne   scasq    # Scan *rdi++ for a match with rax while cx-- != 0
    jne,s   miss        # Missed: flags nz
    movq    [RSI + (sizeof(oopDesc) + Klass::secondary_super_cache_offset_in_bytes())], RAX    # Hit: update cache
    miss:   
2a7       je     B3  P=0.999999 C=-1.000000

B18: #    N250 <- B17  Freq: 1.6983e-07
2ad       movl    RSI, #-83    # int
2b2       movq    [rsp + #8], R13    # spill
2b7       movq    [rsp + #16], RBX    # spill
2bc       movq    [rsp + #32], R14    # spill
2c1       nop     # 2 bytes pad for loops and calls
2c3       call,static  wrapper for: uncommon_trap(reason='unreached' action='reinterpret')
        # OptimizationTest::performTestIface @ bci:36  L[0]=rsp + #0 L[1]=_ L[2]=rsp + #8 L[3]=rsp + #16 L[4]=_ L[5]=rsp + #24 L[6]=rsp + #32 L[7]=_ L[8]=RBP
        # OopMap{[8]=Oop off=712}
2c8       int3    # ShouldNotReachHere

Usually in high performance applications interfaces reflecting patterns like observer or listener are called in a loop with implementations changing frequently. That is why I believe this problem may have practical implications.

Thank you gents for the remarks. I obviously failed to supply flawless test proving my arguments the first time.
Hope you'll find this one solid and entertaining.

End of update

 

I think we all like to write nice code; we very often employ ideas such as composition, separation of concerns and so on. Above all we quite correctly tend to introduce various levels of abstraction by using interfaces and abstract classes. This approach is perfectly valid for outstanding number of scenarios, but may lead to performance deficiency when we're dealing with low latency requirements.

Every time we introduce another level of abstraction, at the same time we add more complexity to the procedure by which JVM resolves target method to be called.
Methods in Java are not invoked directly due to the polymorphic nature of the language. For instance:

final Set set = getImpl(); // returns different implementations  
set.add(new Object());

will have different implications than:

final HashSet set = getImpl(); // returns different implementations 
set.add(new Object());

The former would get translated into:

    ...
    NEW java/lang/Object
    DUP
    INVOKESPECIAL java/lang/Object.<init> ()V
    INVOKEINTERFACE java/util/Set.add (Ljava/lang/Object;)Z
    ...

While the latter would look more like:

    ...
    NEW java/lang/Object
    DUP
    INVOKESPECIAL java/lang/Object.<init> ()V
    INVOKEVIRTUAL java/util/HashSet.add (Ljava/lang/Object;)Z
    ...

The difference in JVM's behaviour for these two cases may not be so obvious even after thorough reading of the instructions specification. The procedure for looking up particular method in virtual method table is almost identical, but the open character of INVOKEINTERFACE semantics renders some optimizations impossible.

Let's consider the below examples. In the first case:

class Parent {
    public void method1() { }
    public void method2() { }
}

class Child extends Parent {
    public void method2() { } // overriden method
    public void method3() { }
}

This setup will result in virtual method table that goes something like:

 Parent
 1Parent.method1
 2Parent.method2
Child
 1 Parent.method1
 2Child.method2
 3Child.method3

Here we can observe how the virtual method table for Child class preserves the order of methods for its parent's class method table. It only overrides the reference/link to the overriden methods thus enabling monomorphic calls and optimizations that employ hard-linking to target methods and completely eliminate the need for method table lookup with each call. This in turn provides means for inlining methods, which can lead to a nice performance boost.

Now, let's look at the case for interfaces:

interface SomeInterface {
    void someMethod();
}

class Parent {
    public void method1() { }
    public void method2() { }
}

class Child extends Parent implements SomeInterface {
    public void method2() { } // overridden method from Parent
    public void someMethod() { }
}

class OtherClass implements SomeInterface {
    public void method3() { }
    public void someMethod() { }
}

Virtual method table for the above would resemble this:

 Parent
 1Parent.method1
 2Parent.method2
 Child
 1Parent.method1
 2Child.method2
 3SomeInterface.someMethod
 OtherClass
 1OtherClass.method3
 2SomeInterface.someMethod

As we can see, there is no order preserved in terms of index of the interface method someMethod(). For Child class, this will be method #3, but for OtherClass entry for this method would be found under #2.

The consequence of introducing these megamorphic calls is the additional cost of virtual method table lookup with each method call. To illustrate the impact of megamorphic calls we can run a simple microbenchmark:

Edit: the below code is obviously flawed (see the comments). Use the one from the top of the article.

import java.math.BigDecimal;

public class InvokeInterfaceTest
{
    private static final long ITERATIONS = 1000 * 1000 * 1000;
    private static final int RUNS = 10;
    private static long implTime;
    private static long ifaceTime;

    public static void main(String[] args)
    {
        performTest(2); //warm up
        ifaceTime = implTime = 0;
        performTest(RUNS);
        final BigDecimal implAvg = new BigDecimal(implTime).divide(new BigDecimal(RUNS));
        final BigDecimal ifaceAvg = new BigDecimal(ifaceTime).divide(new BigDecimal(RUNS));
        System.out.println("Invokevirtual: " + implAvg + " ms");
        System.out.println("Invokeinterface: " + ifaceAvg + " ms");
    }

    private static void performTest(final long runs)
    {
        final FooImpl impl = new FooImpl();
        final Foo iface = new FooImpl();

        for (int run = 0; run < runs; run++)
        {
            System.gc();
            long start = System.currentTimeMillis();
            for (int i = 0; i < ITERATIONS; i++)
            {
                impl.doSomething(i);
            }
            implTime += System.currentTimeMillis() - start;

            System.gc();
            start = System.currentTimeMillis();
            for (int i = 0; i < ITERATIONS; i++)
            {
                iface.doSomething(i);
            }
            ifaceTime += System.currentTimeMillis() - start;
        }
    }

    private static interface Foo
    {
        void doSomething(int i);
    }

    private static class FooImpl implements Foo
    {
        private int i;

        public void doSomething(final int i)
        {
            this.i = i; //avoid optimization
        }
    }
}

Which on my laptop with Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz using Java 1.6.0_26-b03 for 64-bit Linux gives the below results:

Invokevirtual: 735.4 ms
Invokeinterface: 1412.4 ms

In order to minimize the influence of other processeses running in my system, I pinned the test thread to a single core rendering it unavailable to other processes and achiving more stable results (as described in my previous post about java threads). 

So, If you have to deal with huge number of invocations and latency is crucial factor, then you would probably have to consider eliminating megamorphic calls at the cost of less abstract, dodgy design.

Abstraction (computer science) Database Interface (computing)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Stop Using Spring Profiles Per Environment
  • How To Build a Spring Boot GraalVM Image
  • Real-Time Analytics for IoT
  • How We Solved an OOM Issue in TiDB with GOMEMLIMIT

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: