DZone
Java Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Java Zone > Two Million Particles at 25 Frames Per Second on an iPad

Two Million Particles at 25 Frames Per Second on an iPad

Simon Gladman user avatar by
Simon Gladman
·
Feb. 04, 15 · Java Zone · Interview
Like (0)
Save
Tweet
3.16K Views

Join the DZone community and get the full member experience.

Join For Free


Following on from my last post where I managed to calculate and render over 1,000,000 particles in realtime, I've done some pretty effective tweaking of the code to create an app that calculates and renders (with blur and trails) over 2,000,000 particles at around 25 frames per second on my iPad Air 2.

The main change is to reuse the compute shader not only to do the calculation and first render but also do the post-processing. 

In Swift, I set the thread groups and thread group count based on particleCount which is 221 or 2,097,152:

    particle_threadGroupCount = MTLSize(width:32,height:1,depth:1)
    particle_threadGroups = MTLSize(width:(particleCount + 31) / 32, height:1, depth:1)

Because my image 1,204 x 1,024 which is 1,048,576 pixels, I can reuse the kernel function to execute code on each pixel by converting the one dimensional thread_position_in_grid to a two dimension coordinate namedtextureCoordinate:

    const float imageWidth = 1024;
    uint2 textureCoordinate(fast::floor(id / imageWidth),id % int(imageWidth));

    if (textureCoordinate.x < imageWidth && textureCoordinate.y < imageWidth)
    {
        float4 outColor = inTexture.read(textureCoordinate);
        
        // do some work...
        
        outTexture.write(outColor, textureCoordinate);
    }

Having the single shader gave a significant speed improvement. Furthermore, because I'm now passing in a read access texture, I can composite the particles over each other which makes for a better looking render:

    const Particle inParticle = inParticles[id];
    const uint2 particlePosition(inParticle.positionX, inParticle.positionY);
    
    const int type = id % 3;
    
    const float3 thisColor = inTexture.read(particlePosition).rgb;

    const float4 outColor(thisColor.r + (type == 0 ? 0.15 : 0.0),
                          thisColor.g + (type == 1 ? 0.15 : 0.0),
                          thisColor.b + (type == 2 ? 0.15 : 0.0),

                          1.0);

One downside was that I was getting some artefacts  when reading and writing to the same texture. I've overcome this by using a ping-pong technique with two textures in the Swift code that toggle between being the input and output textures with each frame.

I use a flag Boolean to decide which texture to use:

        if flag
        {
            commandEncoder.setTexture(particlesTexture_1, atIndex: 0)
            commandEncoder.setTexture(particlesTexture_2, atIndex: 1)
        }
        else
        {
            commandEncoder.setTexture(particlesTexture_2, atIndex: 0)
            commandEncoder.setTexture(particlesTexture_1, atIndex: 1)

        }

        [...]

        if flag
        {
            particlesTexture_1.getBytes(&imageBytes, bytesPerRow: bytesPerRowInt, fromRegion: region, mipmapLevel: 0)
        }
        else
        {
            particlesTexture_2.getBytes(&imageBytes, bytesPerRow: bytesPerRowInt, fromRegion: region, mipmapLevel: 0)
        }

        flag = !flag

My last version of the code didn't write the image from Metal directly to the UIImageView component, rather, it used an intermediate UIImage instance. I found that by removing this variable could squeeze out an extra few frames per second. 

I've set the Metal optimisations to the maximum in the compiler settings and also prefixed my call to distance() with the fast namespace:

        const float dist = fast::distance(float2(inParticle.positionX, inParticle.positionY), float2(inGravityWell.positionX, inGravityWell.positionY));

For this demonstration, I've removed the touch handlers. There's one gravity well which orbits around the centre of the screen. It gives some nice effects while I plan how to productize my particle system.

All the source code for this project is available in my GitHib repository here.


Frame (networking)

Published at DZone with permission of Simon Gladman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What Are Microservices?
  • Is NoOps the End of DevOps?
  • 12 Modern CSS Techniques For Older CSS Problems
  • Automation Testing vs. Manual Testing: What's the Difference?

Comments

Java Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo