DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Data
  4. Here's Why You Should Never Implement Your Own Caching

Here's Why You Should Never Implement Your Own Caching

Caching is an important part of a lot of applications, so should you build your own? Maybe not. Find out why in this post.

Swizec Teller user avatar by
Swizec Teller
·
Apr. 17, 17 · Tutorial
Like (6)
Save
Tweet
Share
9.06K Views

Join the DZone community and get the full member experience.

Join For Free


Here’s an interesting problem for you: build some simple caching.

Let’s say you have a server and an app. Your server has to do something every time you release a new app. Asking for reviews is a good example. If you automate this, everyone will be happier.

So you keep track of the latest app version and run checks when users ping your API. The list of versions is going to grow fast-ish, and your SELECT statement has to run in code because Postgres doesn’t know how to compare version strings.

In Ruby, finding the latest app version looks like this:

AppVersion.where(platform: platform)
          .sort_by{ |v| Gem::Version.new(v.version) }
          .last


This looks innocent, but it builds an array of all AppVersion models, then sorts it in Ruby, then takes the last one and discards the rest. It’s kind of okay when the table is small, but it’s terrible when the table gets big.

Ruby on Rails’s and your database’s default caching strategies can’t cache this call. You have to run it every time.

The result only changes every few days. Sometimes, it goes unchanged for weeks. But you still have to check every time a user logs in because maybe they’re the first person with a new app version, and you need to know when it showed up.

A Naive Caching Strategy

You should use caching, obviously. A somewhat expensive thing to calculate that is checked often and changes rarely. Cache!

So you implement the simplest approach: memoization.

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.

Your code looks something like this:

class AppVersion < ActiveRecord::Base

  @current = {}

  def self.latest(platform)
    if @current[platform].nil?
      @current[platform] = where(platform: platform)
         .sort_by{ |v| Gem::Version.new(v.version) }
         .last
    end

    @current[platform]
  end

  def self.check_app_version(device)
    latest = self.latest(device.platform)

    if latest
      if Gem::Version.new(device.app_version) > Gem::Version.new(latest.version)
        latest = new_latest(device)
        @current[device.platform] = latest
      else
        AppVersion.find_or_create_by(version: device.app_version,
                                     platform: device.platform)
          .update!(last_seen_at: DateTime.now)
      end
    else
      latest = new_latest(device)
      @current[device.platform] = latest
    end

    latest
  end


Overall, this is the AppVersion model. It stores information about each new app version the server encounters. When a user logs in, we call AppVersion.check_app_version(user.device).

This function:

  1. Fetches the latest app version.
  2. 1.1. `latest` returns saved value if it exists.
  3. 1.2. If not, `latest` saves its result in a class instance variable.
  4. If we got a version, we check if the new app is a newer version.
  5. If the user’s device is newer, we create a new AppVersion entry.
  6. Then we update the class instance variable.
  7. If the device is not newer, we update the last_seen_at timestamp.
  8. If this is the first time ever that we’re checking – there’s no latest – then we make a new latest and update the class instance variable.

Seems reasonable, right? Calculate value, save value, update value when needed. We rely on class instance variables persisting across requests.

You do some testing locally, you write some tests. All good. Feature works. Ship it.

The feature goes to production, you release a new app, it creates 18 entries in the database. 

What went wrong?

The Correct Caching Strategy

The clue is in that 18 number. Our code thought exactly 18 times that it had encountered a new latest app version.

At the time, we had 9 Heroku dynos with 4 threads each running in production. 9*4 = 36, which is not 18. But it’s twice as much as 18 and class instance variables are meant to be shared between threads.

Perhaps the way Puma shares memory between threads, or potential race conditions in our code, means that it took 2 tries before every thread on a machine knew about the new version. It’s hard to say why it works out that way, but it does.

Memoization does not work. Caching is hard.

In retrospect, it is obvious that this was never going to work. Our “server” is distributed among multiple virtual machines. They don’t share memory. How would they ever have seen each other’s class instance variables?

The answer is to stop trying to be clever. Rails has built-in caching that’s been battle tested and developed by smart people.

The fixed code looks like this:

def self.latest(platform)
    Rails.cache.fetch("latest_app_version/#{platform}") do
      where(platform: platform).sort_by{ |v| Gem::Version.new(v.version) }.last
    end
  end

  def self.check_app_version(device)
    latest = self.latest(device.platform)

    if latest
      if Gem::Version.new(device.app_version) > Gem::Version.new(latest.version)
        latest = new_latest(device)
        Rails.cache.delete("latest_app_version/#{device.platform}")
      else
        AppVersion.find_or_create_by(version: device.app_version,
                                     platform: device.platform)
          .update!(last_seen_at: DateTime.now)
      end
    else
      latest = new_latest(device)
    end

    latest
  end

Much better! The logic is the same as before, except that we use Rails.cache as our caching mechanism.

.fetch reads from the cache and if there’s a miss, it runs the provided block and stores its result. .delete deletes the cached value so next time we use latest, it reads from the cache.

The fixed code works because Rails.cache can be configured to use an external caching server – Memcache or Redis for instance. This creates a memory space shared between all server machines and server threads.

Problem solved, crisis averted, lesson learned. Don’t be clever. Use the tools your frameworks give you. 

Cache (computing) app Database

Published at DZone with permission of Swizec Teller, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 4 Best dApp Frameworks for First-Time Ethereum Developers
  • Keep Your Application Secrets Secret
  • Benefits and Challenges of Multi-Cloud Integration
  • Best Practices for Writing Clean and Maintainable Code

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: