Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Pitfalls in the Validation for Uniqueness Using Rails ActiveRecord

DZone's Guide to

Pitfalls in the Validation for Uniqueness Using Rails ActiveRecord

ActiveRecord Validation provides an easy way to validate records for uniqueness, but relying on this can be a problem. Here's how to clean the database with no downtime.

· Web Dev Zone
Free Resource

Start coding today to experience the powerful engine that drives data application’s development, brought to you in partnership with Qlik.

Image title

ActiveRecord Validation in Rails provides an easy way to validate records for uniqueness. With the help of a real world example, we show why relying solely on Rails validation could become a problem and demonstrate a way to cleanup your database without downtime.

The Problem

The tagging of translation keys was added as a product feature in the very early stages of the PhraseApp development. Although the feature was constantly improved over time and the performance was very reliable, one day a customer contacted us due to a validation error. The tagging feature had been used rather extensively by this customer to automatically tag all uploaded keys with a GitHub Pull Request, leading to a validation error with an existing tag. So we investigated the issue and discovered that the error was caused by a tag name which was not unique to the project. So how could that happen? We had used ActiveRecord uniqueness validation and only used Rails without skipping the validation.

Code State

The last state was that the Tag should have a name field and belong to a project as well as have an ActiveRecord validation so that a tag was unique to the project_idand name

State of Codebase

class Tag < ActiveRecord::Base
 has_and_belongs_to_many :translation_keys   
  ...   
  belongs_to :project   
  belongs_to :account   
  ...   
  validates_uniqueness_of :name, scope: :project_id   
  ... 
end

Our code to create new tags looked something like this:

 
class Keys::TagService
  def self.create_or_find_tag_by_name_for_project(project, tag_name, user, ...)
    ...
    tag = TagRepository.tags_for_project(project).find_or_initialize_by(name: tag_name)
    if tag.new_record? && tag.save
      ...
    end
 
    tag
  end
end


State Within Database

Checking our database revealed that there were 40k records that were not unique.

SELECT COUNT(*) FROM
  (SELECT COUNT(*) 
   FROM tags
   GROUP BY project_id, name
   HAVING COUNT(*) > 1) AS g

Furthermore, we saw that all of the duplicated records were system generated tags made by our upload feature. Uploads are processed asynchronously by concurrent workers. By using our client’s push command, an upload for each locale is triggered. On any upload where the user can allocate a tag, the keys should also be tagged. This combination dramatically increases the probability of two processes trying to create a tag at the same time.

What Happened?

If uniqueness validation is enabled, Rails will look for existing records before performing Tag.create, Tag.save, Tag.update ... operations. If a record was found, the validation fails and the transaction will be rolled back, if not the record will be saved.

Example Validation fail:

pry(main)> Tag.create(name: "test", project_id: 1)
   (0.2ms)  BEGIN
  ...
  Tag Exists (0.2ms)  SELECT  1 AS one FROM `tags` WHERE (`tags`.`name` = BINARY 'test' AND `tags`.`project_id` = 1) LIMIT 1
   (0.1ms)  ROLLBACK

Example success:

pry(main)> Tag.create(name: "test", project_id: 1)
   (0.2ms)  BEGIN
  ### Some other validations ###
  ..
  Tag Exists (0.3ms)  SELECT  1 AS one FROM `tags` WHERE (`tags`.`name` = BINARY 'test' AND `tags`.`project_id` = 1) LIMIT 1
  ..
  ### some before_save and before_create hooks ###
  ..
  SQL (0.2ms)  INSERT INTO `tags` (`name`, `project_id`, `slug`, `created_at`, `updated_at`) VALUES ('test', 1, 'test', '2017-02-13 12:56:07', '2017-02-13 12:56:07')
   (14.7ms)  COMMIT

Through this example, you can see that there is no lock on the table between the SELECT and the INSERT statement. In this time gap, another concurrent process can also create a record with the same name and the same project_id without any failing validation. As you can see hooks are also triggered between the validation and INSERT statement. The increase in the number and the complexity of hooks will also lead to an increased probability of the creation of non-unique records.

Solution and Lessons Learned

The solution in preventing the creation of non-unique records would, therefore, be setting up a unique database index. There was already an index on project_id and name but the index was not unique.

To make this index unique without downtime, these steps are needed:

  1. Make sure no new duplicated records can be created.
  2. Cleanup Database.
  3. Remove old index.
  4. Create a new unique index.

Preventing the Creation Of Non-Unique Records Between Cleanup and Index Migration

Adding a unique index to a table which contains non-unique entries will raise an exception. Therefore, we will need to cleanup the database before adding the index. If new duplicated entries are introduced between the cleanup and the adding of the index, the migration will fail. So we first have to make sure that no new duplicated records can be created. You can solve this by using temporary tables, but we chose another approach. As MySQL ignores NULL values for unique indexes, we added a new column that we set to project_id and the name concatenated to the new records. The default value of this column should be NULL. On this field, we are able to add a unique index.

class AddTmpFieldToTags < ActiveRecord::Migration
  def change
    add_column :tags, :tmp_field, :string, null: true, default: nil

    add_index :tags, [:tmp_field], unique: true
  end
end

When we are not creating any tag record skipping hooks, we can add a before_savehook that sets the tmp_field to the concatenation of project_id and name. This index will prevent the creation of non-unique tag entries.

Now, the application will raise an exception every time it tries to create a duplicated entry. In order to avoid this, we should change our code for creating tags so that it handle this issue. Rails will raise a RecordNotUnique exception in this case so we can catch this and select and return the existing tag. Here you should make sure to reload, so as to prevent strange query caching issues.

...
  def self.create_or_find_tag_by_name_for_project(project, tag_name, user)
    tag = TagRepository.tags_for_project(project).find_or_initialize_by(name: tag_name)
    begin
      if tag.new_record? && tag.save
        ...
      end 
    rescue ActiveRecord::RecordNotUnique
      tag = TagRepository.tags_for_project(project).reload.find_by(name: tag_name)
    end 

    tag
  end
...

Cleanup and Index Change

After the changes are deployed, we can cleanup the database. Here you can go for migration or a script. We decided to do this by a migration. During the cleanup, it is important to set the correct associations because we do not want to lose the association between tags and the translation keys. After the cleanup is complete, the index can be changed to be unique.

class AddUniqueIndexOnProjectIdAndNameToTags < ActiveRecord::Migration def change remove_index :tags, [:project_id, :name] cleanup add_index :tags, [:project_id, :name], unique: true end private def cleanup dub_tags = ActiveRecord::Base.connection.exec_query("SELECT name, project_id, COUNT(*) FROM tags GROUP BY name, project_id HAVING COUNT(*) > 1").to_hash

    dub_tags.each do |tag|
      unify_tag(tag["name"], tag["project_id"])
    end
  end

  def unify_tag(name, project_id)
    tags = Tag.where(project_id: project_id, name: name).order(:id)
    return if tags.count < 2

    Tag.transaction do
      keys_to_tag = []
      tag_to_keep = tags.to_a.first
      tags.each_with_index do |tag, index|
        keys_to_tag = keys_to_tag.concat(tag.translation_keys.to_a)
        tag.destroy if tag.id != tag_to_keep.id
      end

      tag_to_keep.translation_keys = keys_to_tag.uniq { |key| key.id }
      tag_to_keep.save
    end
  end
end

After this migration has run, the temporary field and the hook to fill it can be removed.

Lessons Learned

  • When to use a unique index with the Rails uniqueness validation.
  • How to conduct a cleanup and add a unique index with zero downtime on MySQL.

Create data driven applications in Qlik’s free and easy to use coding environment, brought to you in partnership with Qlik.

Topics:
rails ,activerecord ,develop ,pitfalls ,localization ,internatinalization

Published at DZone with permission of Stephan Schoening. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}