Over a million developers have joined DZone.

1,000,000 Particles on an iPad!

DZone's Guide to

1,000,000 Particles on an iPad!

· Performance Zone ·
Free Resource

Sensu is an open source monitoring event pipeline. Try it today.

After my recent experiment with a Metal framework GPU based particle system, I've taken the code a step further and managed to get a million particle system running at 20fps (or over 30fps if I disable the glow composite shader) on an iPad Air 2. Actually, because the new technique requires power-of-two length datasets, I've actually got a 1,048,576 particle system, but what's forty eight thousand between friends?

Here's what a million red, green and blue particles look like. This is a realtime, unadulterated screen recording from my iPad Air 2:

The technique I've used comes from this amazingly understated blog post from memkite.com. In it, Amund Tveit discusses a way to share data between the CPU and GPU. Using this technique, I no longer write back the particle data from Metal to Swift which gives a significant speed improvement.

In a nutshell, I define some constants and declare a handful of mutable pointers and a mutable buffer pointer:

    let particleCount: Int = 1048576
    var particlesMemory:UnsafeMutablePointer<Void> = nil
    let alignment:UInt = 0x4000
    let particlesMemoryByteSize:UInt = UInt(1048576) * UInt(sizeof(Particle))
    var particlesVoidPtr: COpaquePointer!
    var particlesParticlePtr: UnsafeMutablePointer<Particle>!

    var particlesParticleBufferPtr: UnsafeMutableBufferPointer<Particle>!

When I set up the particles, I populate the pointers and use posix_memalign() to allocate the memory:

        posix_memalign(&particlesMemory, alignment, particlesMemoryByteSize)
        particlesVoidPtr = COpaquePointer(particlesMemory)
        particlesParticlePtr = UnsafeMutablePointer<Particle>(particlesVoidPtr)

        particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)

The loop to populate the particles is slightly different - I now loop over the buffer pointer:

        for index in particlesParticleBufferPtr.startIndex ..< particlesParticleBufferPtr.endIndex

            let particle = Particle(positionX: positionX, positionY: positionY, velocityX: velocityX, velocityY: velocityY)
            particlesParticleBufferPtr[index] = particle

Inside the applyShader() function, I create a copy of the memory which is used as both the input and output buffer:

        let particlesBufferNoCopy = device.newBufferWithBytesNoCopy(particlesMemory, length: Int(particlesMemoryByteSize),
            options: nil, deallocator: nil)
        commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 0)

        commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 1)

...and after the shader has run, I put the shared memory (particlesMemory) back into the buffer pointer:

        particlesVoidPtr = COpaquePointer(particlesMemory)
        particlesParticlePtr = UnsafeMutablePointer(particlesVoidPtr)

        particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)

For a better explanation, I'd suggest a look at the original memkite.com blog post

I've made a new branch that uses this technique which you can access here. The original branch that uses a simple array is still available to compare and contrast.

Incredibly, this simulation runs at almost 17fps on my iPhone 6 and shows the potential of the Metal Framework combined with Swift not just for games but for some pretty serious simulation work.

Sensu: workflow automation for monitoring. Learn more—download the whitepaper.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}