Over a million developers have joined DZone.

1,000,000 Particles on an iPad!

DZone's Guide to

1,000,000 Particles on an iPad!

· Performance Zone
Free Resource

After my recent experiment with a Metal framework GPU based particle system, I've taken the code a step further and managed to get a million particle system running at 20fps (or over 30fps if I disable the glow composite shader) on an iPad Air 2. Actually, because the new technique requires power-of-two length datasets, I've actually got a 1,048,576 particle system, but what's forty eight thousand between friends?

Here's what a million red, green and blue particles look like. This is a realtime, unadulterated screen recording from my iPad Air 2:

The technique I've used comes from this amazingly understated blog post from memkite.com. In it, Amund Tveit discusses a way to share data between the CPU and GPU. Using this technique, I no longer write back the particle data from Metal to Swift which gives a significant speed improvement.

In a nutshell, I define some constants and declare a handful of mutable pointers and a mutable buffer pointer:

    let particleCount: Int = 1048576
    var particlesMemory:UnsafeMutablePointer<Void> = nil
    let alignment:UInt = 0x4000
    let particlesMemoryByteSize:UInt = UInt(1048576) * UInt(sizeof(Particle))
    var particlesVoidPtr: COpaquePointer!
    var particlesParticlePtr: UnsafeMutablePointer<Particle>!

    var particlesParticleBufferPtr: UnsafeMutableBufferPointer<Particle>!

When I set up the particles, I populate the pointers and use posix_memalign() to allocate the memory:

        posix_memalign(&particlesMemory, alignment, particlesMemoryByteSize)
        particlesVoidPtr = COpaquePointer(particlesMemory)
        particlesParticlePtr = UnsafeMutablePointer<Particle>(particlesVoidPtr)

        particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)

The loop to populate the particles is slightly different - I now loop over the buffer pointer:

        for index in particlesParticleBufferPtr.startIndex ..< particlesParticleBufferPtr.endIndex

            let particle = Particle(positionX: positionX, positionY: positionY, velocityX: velocityX, velocityY: velocityY)
            particlesParticleBufferPtr[index] = particle

Inside the applyShader() function, I create a copy of the memory which is used as both the input and output buffer:

        let particlesBufferNoCopy = device.newBufferWithBytesNoCopy(particlesMemory, length: Int(particlesMemoryByteSize),
            options: nil, deallocator: nil)
        commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 0)

        commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 1)

...and after the shader has run, I put the shared memory (particlesMemory) back into the buffer pointer:

        particlesVoidPtr = COpaquePointer(particlesMemory)
        particlesParticlePtr = UnsafeMutablePointer(particlesVoidPtr)

        particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)

For a better explanation, I'd suggest a look at the original memkite.com blog post

I've made a new branch that uses this technique which you can access here. The original branch that uses a simple array is still available to compare and contrast.

Incredibly, this simulation runs at almost 17fps on my iPhone 6 and shows the potential of the Metal Framework combined with Swift not just for games but for some pretty serious simulation work.


Published at DZone with permission of Simon Gladman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}