Engineering High-Performance Real-Time Leaderboard

Replaced slow slice-based leaderboard with indexed skip list + hash IDs, cutting updates to <1ms and stabilizing CPU/memory under loads.

Ivan Balashov

Mar. 27, 26 · Analysis

Likes (1)

Comment

Save

3.1K Views

Leaderboard performance problems rarely announce themselves as “data structure issues.” They surface instead as CPU spikes, tail-latency explosions, and on-call alerts that refuse to quiet down. That’s exactly what we encountered: a slice-based leaderboard implementation that initially appeared perfectly reasonable, but began to collapse once the system surpassed the 10,000-user mark and update workloads started behaving like O(N²).

This article walks through what broke, how profiling made the root cause undeniable, and how the issue was fixed by replacing the original approach with an indexed skip list augmented with span counters and a hash-based identity layer. The redesign reduced critical operations to O(log N), stabilized memory usage, and pushed update latency below one millisecond.

Under production load, the failure mode was blunt. CPU usage was pinned at 100%, the OOM killer repeatedly killed pods, and user-facing outages were triggered by traffic increases as small as ten percent. Profiling confirmed that the core update path was effectively quadratic, dominated by linear scans and repeated full re-sorts of the leaderboard.

Within two working days, the naive implementation was replaced with a probabilistically balanced data structure and a proper identity deduplication layer. The result was sub-millisecond updates, stable CPU and memory behavior, and reliable operation with more than 100,000 concurrent users.

Key Results (Production)

metric	before	after	improvement
Update latency (10K users)	50–100ms	<1ms	~100×
Position query	O(N)	O(log N)	~1000×
CPU under +10% load	100% / unstable	30–40% / stable	Major
OOM incidents	Multiple/day	Zero	Resolved
Memory per user	~200 bytes	~104 bytes	-48%
Delivery time	—	2 working days	vs 2+ weeks (Redis path)

Why This Was Harder Than “Just Sort a List”

Modern leaderboards are deceptively complex systems. They must handle high write throughput while serving constant rank queries, deduplicate users who may appear under multiple identifiers, support multi-tenant isolation, and allow reads to proceed without being blocked by ongoing updates.

This wasn’t a theoretical performance concern or a nice-to-have optimization. It was a production incident. A modest increase in load reliably triggered a failure cascade: CPU saturation, memory exhaustion, pod restarts, and eventually, users being affected. During peak traffic windows, this cycle repeated continuously, degrading not only the leaderboard itself but also dependent services.

Profiling made the root cause impossible to ignore. The overwhelming majority of CPU time was spent inside the leaderboard merge logic, with sorting and string comparisons compounding the problem. The hot path wasn’t sorting alone; it was the combined cost of linear identity scans, repeated slice growth and reallocations, and resorting the entire dataset after each batch of updates.

Original Implementation: Why It Fell Apart

At its core, the original leaderboard was implemented as a mutex-protected slice. Updates were handled by scanning the entire list to find matching users, appending new entries when necessary, and then sorting the full slice by score.

    Go
   
 

   type LeaderBoard struct {
    mu      sync.RWMutex
    Entries []*LeaderBoardEntry
}

The update logic relied on nested loops and a full sort after every merge:
func mergeEntries(curr, updates []*LeaderBoardEntry) []*LeaderBoardEntry {
    // ...
}
  

This approach works acceptably at small scales, but its performance characteristics degrade rapidly. For a leaderboard with N users and M updates, identity lookup alone costs O(M×N), sorting costs O(N log N), and rank queries often devolve into linear scans. When M approaches N, the system effectively behaves as O(N²).

At around 10,000 entries, that difference becomes existential. What once felt “fine” suddenly turns into a meltdown.

Additional Hidden Problems

Beyond raw algorithmic complexity, the original design had several subtle but damaging flaws. Deduplication logic that matched only on user ID or email failed when the same user appeared under multiple identifiers, resulting in duplicate leaderboard entries. Repeated slice resizing and string comparisons created sustained garbage collection pressure, amplifying CPU load and instability. Finally, because updates held a global lock during sorting, read operations were effectively starved under load.

Each of these issues reinforced the others, creating a feedback loop that made the system increasingly fragile as traffic grew.

The Design: O(1) Identity and O(log N) Ranking

Fixing the system required more than micro-optimizations. The solution needed constant-time identity resolution, logarithmic insert and update performance, efficient rank queries, and safe concurrency that allowed reads to proceed independently of writes.

Redis sorted sets were evaluated and rejected, not because they are poor tools, but because they didn’t fit the constraints of this incident. Infrastructure provisioning alone would have taken weeks, and network round-trip times would have introduced millisecond-scale latency. The system needed a fix measured in days, with sub-millisecond performance. That made an in-process solution the only viable option.

Solution: Indexed Skip List With Hash-Based Identity

Skip lists provide probabilistic balancing with O(log N) inserts and searches, while remaining simpler to implement and reason about than balanced trees. To support efficient rank queries, span counters were added at each level. These counters transform the skip list from a fast ordering structure into one capable of answering rank queries without walking the entire list.

Identity resolution was handled separately through a hash-based layer that maps all known identifiers for a user to a single internal numeric ID. This decouples identity complexity from ranking logic and ensures deduplication happens in constant time.

Architecture Overview

The final system consists of three tightly integrated components: a hash-based identity store that maps multiple identifiers to a single internal user ID, a skip list that stores leaderboard entries ordered by score and timestamp, and a direct mapping from user ID to skip list node to support constant-time lookups.

Together, these components allow identity resolution in O(1), ranking operations in O(log N), and position queries without full traversal.

Component 1: User Hash Storage (Deduplication Layer)

The identity layer normalizes messy, real-world identifiers into a single internal representation. A user may appear as a group-scoped user ID, an email address, or one or more account IDs. All of these are converted into hashes and mapped to a single uint32 user ID.

This approach dramatically reduces memory overhead compared to string-based identity handling and ensures that deduplication remains fast and deterministic even under heavy load.

Component 2: Skip List With Span Counters (Ranking Layer)

Each skip list node stores forward pointers and span counters. The spans represent how many nodes are skipped at each level, allowing rank calculation to be performed by summing spans during traversal rather than walking the base level.

This is the key insight that enables rank queries to remain logarithmic as the leaderboard grows.

Insert, Update, and Rank Queries

When a user already exists, and their score hasn’t changed, the system updates the payload in place without repositioning the node. If the score has changed, the node is removed and reinserted, preserving ordering and span correctness.

Rank queries work by traversing the skip list from the top level down, accumulating span counts to determine how many nodes precede the target entry. This avoids the need for full scans and keeps rank queries efficient even at large scales.

Testing, Validation, and Production Impact

This optimization succeeded because it was verified rigorously. Unit tests validate skip list invariants and span correctness. Integration tests ensure identity deduplication behaves correctly across multiple identifiers. Benchmarks confirm logarithmic scaling under realistic workloads.

After deployment, the production impact was immediate and sustained. OOM kills disappeared entirely, CPU usage stabilized at 30–40% under peak load, leaderboard updates consistently completed in under one millisecond, and user complaints about lag or ranking inconsistencies stopped completely. The system now has ample headroom for additional leaderboards and increased traffic.

Lessons Learned

The incident reinforced several hard-earned lessons. Profiling must come before optimization, as intuition alone is unreliable. Data structures should be chosen based on access patterns rather than familiarity. Cache behavior matters just as much as Big-O notation. And when implementing custom data structures, comprehensive testing is not optional.

Conclusion

This leaderboard failed in production for a simple reason: it outgrew its data structure. A sorted slice with linear deduplication and global resorting can work at small scales, but under real throughput, it becomes a system-wide liability.

By introducing a constant-time identity layer and an indexed skip list with span counters, critical operations were reduced to O(log N), instability was eliminated, and the system became both faster and safer under load. This wasn’t about cleverness. It was about choosing the right data structure before the system chose failure for us.

References

Pugh, W. (1990). Skip Lists: A Probabilistic Alternative to Balanced Trees. Communications of the ACM.
Redis Documentation — Sorted Sets. https://redis.io/docs/data-types/sorted-sets/
LevelDB — MemTable and Skip Lists. https://github.com/google/leveldb
Herlihy & Shavit — The Art of Multiprocessor Programming, Skip Lists chapter.

CPU time Data structure Performance

Opinions expressed by DZone contributors are their own.

Related

Trending