Over a million developers have joined DZone.

Two Million Particles at 25 Frames Per Second on an iPad

DZone 's Guide to

Two Million Particles at 25 Frames Per Second on an iPad

· Java Zone ·
Free Resource

Following on from my last post where I managed to calculate and render over 1,000,000 particles in realtime, I've done some pretty effective tweaking of the code to create an app that calculates and renders (with blur and trails) over 2,000,000 particles at around 25 frames per second on my iPad Air 2.

The main change is to reuse the compute shader not only to do the calculation and first render but also do the post-processing. 

In Swift, I set the thread groups and thread group count based on particleCount which is 221 or 2,097,152:

    particle_threadGroupCount = MTLSize(width:32,height:1,depth:1)
    particle_threadGroups = MTLSize(width:(particleCount + 31) / 32, height:1, depth:1)

Because my image 1,204 x 1,024 which is 1,048,576 pixels, I can reuse the kernel function to execute code on each pixel by converting the one dimensional thread_position_in_grid to a two dimension coordinate namedtextureCoordinate:

    const float imageWidth = 1024;
    uint2 textureCoordinate(fast::floor(id / imageWidth),id % int(imageWidth));

    if (textureCoordinate.x < imageWidth && textureCoordinate.y < imageWidth)
        float4 outColor = inTexture.read(textureCoordinate);
        // do some work...
        outTexture.write(outColor, textureCoordinate);

Having the single shader gave a significant speed improvement. Furthermore, because I'm now passing in a read access texture, I can composite the particles over each other which makes for a better looking render:

    const Particle inParticle = inParticles[id];
    const uint2 particlePosition(inParticle.positionX, inParticle.positionY);
    const int type = id % 3;
    const float3 thisColor = inTexture.read(particlePosition).rgb;

    const float4 outColor(thisColor.r + (type == 0 ? 0.15 : 0.0),
                          thisColor.g + (type == 1 ? 0.15 : 0.0),
                          thisColor.b + (type == 2 ? 0.15 : 0.0),


One downside was that I was getting some artefacts  when reading and writing to the same texture. I've overcome this by using a ping-pong technique with two textures in the Swift code that toggle between being the input and output textures with each frame.

I use a flag Boolean to decide which texture to use:

        if flag
            commandEncoder.setTexture(particlesTexture_1, atIndex: 0)
            commandEncoder.setTexture(particlesTexture_2, atIndex: 1)
            commandEncoder.setTexture(particlesTexture_2, atIndex: 0)
            commandEncoder.setTexture(particlesTexture_1, atIndex: 1)



        if flag
            particlesTexture_1.getBytes(&imageBytes, bytesPerRow: bytesPerRowInt, fromRegion: region, mipmapLevel: 0)
            particlesTexture_2.getBytes(&imageBytes, bytesPerRow: bytesPerRowInt, fromRegion: region, mipmapLevel: 0)

        flag = !flag

My last version of the code didn't write the image from Metal directly to the UIImageView component, rather, it used an intermediate UIImage instance. I found that by removing this variable could squeeze out an extra few frames per second. 

I've set the Metal optimisations to the maximum in the compiler settings and also prefixed my call to distance() with the fast namespace:

        const float dist = fast::distance(float2(inParticle.positionX, inParticle.positionY), float2(inGravityWell.positionX, inGravityWell.positionY));

For this demonstration, I've removed the touch handlers. There's one gravity well which orbits around the centre of the screen. It gives some nice effects while I plan how to productize my particle system.

All the source code for this project is available in my GitHib repository here.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}