Over a million developers have joined DZone.

How to Push Assets to S3 with Rake: Versioning and Cache Expiration

DZone 's Guide to

How to Push Assets to S3 with Rake: Versioning and Cache Expiration

· Cloud Zone ·
Free Resource

A while ago I wrote about how we package and push Rails assets to Amazon S3. We version assets with the GIT hash – varying the assets by URL enables setting indefinite cache expiration and works well with a CDN. In that post you could find a Rake task that would delete any old assets and replace them with newer assets. It’s time for a revision with some new features.

The first problem we have solved is how long it takes to sync contents between a local folder and S3. The old task fetched the entire bucket file list, which grew quite a bit over time. The S3 API supports a prefix option.


    s3i.incrementally_list_bucket(to, prefix: "assets/") do |response|
      response[:contents].each do |existing_object|


The second issue is with asset rollback. We deploy assets to S3 and then code to Heroku. The asset deployment deletes the old assets. There’s a small window in which we have old code and new assets, which is obviously not okay. We’re actually saved by CloudFront which keeps a cache for extended periods of time. A solution is to keep two copies of the assets online: current and previous. The code preserves the most recent copy by looking at the :last_modified field of the S3 object.

Here’s the task with some shortcuts and a complete task as a gist.



    # uploads assets to s3 under assets/githash, deletes stale assets
    task :uploadToS3, [ :to ] => :environment do |t, args|
      from = File.join(Rails.root, 'public/assets')
      to = args[:to]
      hash = (`git rev-parse --short HEAD` || "").chomp
      logger.info("[#{Time.now}] fetching keys from #{to}")
      existing_objects_hash = {}
      existing_assets_hash = {}
      s3i.incrementally_list_bucket(to, prefix: "assets/") do |response|
        response[:contents].each do |existing_object|
          existing_objects_hash[existing_object[:key]] = existing_object
          previous_asset_hash = existing_object[:key].split('/')[1]
          existing_assets_hash[previous_asset_hash] ||= DateTime.parse(existing_object[:last_modified])
      logger.info("[#{Time.now}] #{existing_assets_hash.count} existing asset(s)")
      previous_hash = nil
      existing_assets_hash.each_pair do |asset_hash, last_modified|
        logger.info(" #{asset_hash} => #{last_modified}")
        previous_hash = asset_hash unless (previous_hash and existing_assets_hash[previous_hash] > last_modified)
      logger.info("[#{Time.now}] keeping #{previous_hash}") if previous_hash
      logger.info("[#{Time.now}] copying from #{from} to s3:#{to} @ #{hash}")
      Dir.glob(from + "/**/*").each do |entry|
        next if File::directory?(entry)
        File.open(entry) do |entry_file|
          content_options = {}
          content_options['x-amz-acl'] = 'public-read'
          content_options['content-type'] = MIME::Types.type_for(entry)[0]
          key = 'assets/'
          key += (hash + '/') if hash
          key += entry.slice(from.length + 1, entry.length - from.length - 1)
          logger.info("[#{Time.now}]  uploading #{key}")
          s3i.put(to, key, entry_file, content_options)
      existing_objects_hash.keys.each do |key|
        next if previous_hash and key.start_with?("assets/#{previous_hash}/")
        puts "deleting #{key}"
        s3i.delete(to, key)


Since we’re versioning assets with a GIT hash in the URL, another improvement is to set cache expiration to something longer.



content_options['cache-control'] = "public, max-age=#{365*24*60*60}"




Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}