DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Advanced Maintenance of a Multi-Database Citus Cluster With Flyway
  • Rust’s Ownership and Borrowing Enforce Memory Safety
  • What Are Protocol Buffers?
  • Low Code: Viable for Developers?

Trending

  • A Better Web3 Experience: Account Abstraction From Flow (Part 1)
  • The Promise of Personal Data for Better Living
  • Apache Flink
  • Supercharge Your Communication With Twilio and Ballerina
  1. DZone
  2. Culture and Methodologies
  3. Agile
  4. Mistakes in Micro Benchmarks

Mistakes in Micro Benchmarks

Even benchmarks make mistakes. Check out these tips on potential changes, and the results of casting.

Oren Eini user avatar by
Oren Eini
·
Oct. 20, 15 · Analysis
Like (3)
Save
Tweet
Share
2.32K Views

Join the DZone community and get the full member experience.

Join For Free

so on my last post i showed a bunch of small micro benchmark, and aside from the actual results, i wasn’t really sure what was going on there. luckily, i know a few perf experts, so i was able to lean on them.

in particular, the changes that were recommended were:

  • don’t make just a single tiny operation, it is easy to get too much jitter in the setup for the call if the op is too cheap.
  • pay attention to potential data issues, the compiler/jit can decide to put something on a register, in which case you are benching the cpu directly, which won’t be the case in the real world.

i also learned how to get the actual assembly being run, which is great. all in all, we get the following benchmark code:

[benchmarktask(platform: benchmarkplatform.x86,
            jitversion: benchmarkjitversion.ryujit)]
[benchmarktask(platform: benchmarkplatform.x86,
            jitversion: benchmarkjitversion.legacyjit)]
[benchmarktask(platform: benchmarkplatform.x64,
                jitversion: benchmarkjitversion.legacyjit)]
[benchmarktask(platform: benchmarkplatform.x64,
                jitversion: benchmarkjitversion.ryujit)]
public unsafe class tocastornottocast
{
    byte* p1, p2, p3, p4;
    fooheader* h1, h2,h3,h4;
    public tocastornottocast()
    {
        p1 = (byte*)marshal.allochglobal(1024);
        p2 = (byte*)marshal.allochglobal(1024);
        p3 = (byte*)marshal.allochglobal(1024);
        p4 = (byte*)marshal.allochglobal(1024);
        h1 = (fooheader*)p1;
        h2 = (fooheader*)p2;
        h3 = (fooheader*)p3;
        h4 = (fooheader*)p4;
    }

    [benchmark]
    [operationsperinvoke(4)]
    public void nocast()
    {
        h1->pagenumber++;
        h2->pagenumber++;
        h3->pagenumber++;
        h4->pagenumber++;
    }

    [benchmark]
    [operationsperinvoke(4)]
    public void cast()
    {
        ((fooheader*)p1)->pagenumber++;
        ((fooheader*)p2)->pagenumber++;
        ((fooheader*)p3)->pagenumber++;
        ((fooheader*)p4)->pagenumber++;
    }
}

and the following results:

          method | platform |       jit |   avrtime |    stddev |             op/s |
---------------- |--------- |---------- |---------- |---------- |----------------- |
            cast |      x64 | legacyjit | 0.2135 ns | 0.0113 ns | 4,683,511,436.74 |
          nocast |      x64 | legacyjit | 0.2116 ns | 0.0017 ns | 4,725,696,633.67 |
            cast |      x64 |    ryujit | 0.2177 ns | 0.0038 ns | 4,593,221,104.97 |
          nocast |      x64 |    ryujit | 0.2097 ns | 0.0006 ns | 4,769,090,600.54 |
---------------- |--------- |---------- |---------- |---------- |----------------- |
            cast |      x86 | legacyjit | 0.7465 ns | 0.1743 ns | 1,339,630,922.79 |
          nocast |      x86 | legacyjit | 0.7474 ns | 0.1320 ns | 1,337,986,425.19 |
            cast |      x86 |    ryujit | 0.7481 ns | 0.3014 ns | 1,336,808,932.91 |
          nocast |      x86 |    ryujit | 0.7426 ns | 0.0039 ns | 1,346,537,728.81 |

interestingly enough, the nocast approach is faster in pretty much all setups.

here is the assembly code for legacyjit in x64:

image

for ryujit, the code is identical for the cast code, and the only difference in the no casting code is that the mov edx, ecx is mov rdx,rcx in ryujit.

as an aside, x64 assembly code is much easier to read than x86 assembly code.

in short, casting or not casting has a very minor performance difference, but not casting allows us to save a pointer reference in the object, which means it will be somewhat smaller, and if we are going to have a lot of them, then that can be a pretty nice space saving.

Assembly (CLI) X86 assembly 64-bit Pointer (computer programming) Data (computing) Lean (proof assistant) POST (HTTP) Space (architecture)

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Advanced Maintenance of a Multi-Database Citus Cluster With Flyway
  • Rust’s Ownership and Borrowing Enforce Memory Safety
  • What Are Protocol Buffers?
  • Low Code: Viable for Developers?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: