DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Loading XML into MongoDB
  • Mastering Full-Stack Development: A Comprehensive Beginner’s Guide to the MERN Stack
  • How To Perform Data Migration in MongoDB Using Node.js
  • Leveraging AI and Vector Search in Azure Cosmos DB for MongoDB vCore

Trending

  • Start Coding With Google Cloud Workstations
  • Infrastructure as Code (IaC) Beyond the Basics
  • How to Build Real-Time BI Systems: Architecture, Code, and Best Practices
  • Ethical AI in Agile
  1. DZone
  2. Data Engineering
  3. Databases
  4. Experimenting With Unique Constraints in CockroachDB, MongoDB, and FerretDB

Experimenting With Unique Constraints in CockroachDB, MongoDB, and FerretDB

This is an unexpected third part of an interesting experiment with FerretDB and CockroachDB. This time, we're going to expand on the previous article.

By 
Artem Ervits user avatar
Artem Ervits
DZone Core CORE ·
Sep. 14, 22 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
4.7K Views

Join the DZone community and get the full member experience.

Join For Free

Motivation

Once I completed my first two articles, I realized there are a lot of possibilities exposed by proxying MongoDB collections through FerretDB backed by CockroachDB. CockroachDB has unique data domiciling capabilities available through multi-region abstractions, inverted and partial indexes, computed columns and of course strong consistency. Today, we're going to discuss unique constraints in MongoDB and CockroachDB.

High-Level Steps

  • Start a 9-node multi-region cluster (CockroachDB Dedicated)
  • Start FerretDB (Docker)
  • Unique Indexes
  • Considerations
  • Conclusion

Step-by-Step Instructions

Start a 9-Node Multi-Region Cluster (Cockroachdb Dedicated)

I am going to use the same CockroachDB Dedicated cluster from the previous article. Please refer to the previous article for the detailed steps. You can get a 30-day trial of CockroachDB Dedicated following this link.

Start FerretDB (Docker)

I'll be using the same compose file from the previous article, however, as we will be discussing multi-region, I will make changes to the compose file and discuss the steps below.

Unique Indexes

FerretDB is under development and many features may still be unavailable. In the previous article, I migrated a MongoDB sample_mflix dataset which relies on unique indexes for some of the collections. As of this writing, I've not been able to validate unique constraints work in FerretDB. I've concluded that it wasn't implemented yet. It inhibits some of the capabilities we'd like to discuss and I will demonstrate how instead we can leverage CockroachDB for globally strong consistent unique indexes.

Restore the sample_mflix.users collection if you've not done so previously

Shell
 
mongorestore --archive=sampledata.archive --nsInclude=sample_mflix.users --numInsertionWorkersPerCollection=100
Plain Text
 
2022-09-12T16:29:24.266-0400    sample_mflix.users  28.9KB
2022-09-12T16:29:26.156-0400    sample_mflix.users  28.9KB
2022-09-12T16:29:26.156-0400    finished restoring sample_mflix.users (185 documents, 0 failures)
2022-09-12T16:29:26.200-0400    restoring indexes for collection sample_mflix.users from metadata
2022-09-12T16:29:26.200-0400    index: &idx.IndexDocument{Options:primitive.M{"name":"email_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"email", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2022-09-12T16:29:26.203-0400    185 document(s) restored successfully. 0 document(s) failed to restore.


Notice a unique constraint was restored along with the data, looking at the dataset description, indeed there's a unique constraint on the email field. The schema in CockroachDB is not aware of the constraint:

MySQL
 
 CREATE TABLE sample_mflix.users_5e7cc513 (                     
         _jsonb JSONB NULL,                                     
         rowid INT8 NOT VISIBLE NOT NULL DEFAULT unique_rowid(),
         CONSTRAINT users_5e7cc513_pkey PRIMARY KEY (rowid ASC) 
 );


If we attempted to restore this file again, the constraint would not fire, in fact, it would append to the existing rows.

Originally we restored 185 records, if we rerun the restore, we should see 370 records

Shell
 
2022-09-12T16:38:01.207-0400    sample_mflix.users  28.9KB
2022-09-12T16:38:01.207-0400    finished restoring sample_mflix.users (185 documents, 0 failures)
2022-09-12T16:38:01.247-0400    restoring indexes for collection sample_mflix.users from metadata
2022-09-12T16:38:01.247-0400    index: &idx.IndexDocument{Options:primitive.M{"name":"email_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"email", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2022-09-12T16:38:01.251-0400    185 document(s) restored successfully. 0 document(s) failed to restore.
SQL
 
ferretdb=> select count(*) from sample_mflix.users_5e7cc513 ;
 count 
-------
   370


It's obvious the unique constraint is not enforced. What does FerretDB show:

Shell
 
sample_mflix> db.getCollectionInfos(  )
[ { name: 'users', type: 'collection' } ]
Shell
 
sample_mflix> db.users.getIndexes()
MongoServerError: no such command: 'listIndexes'


Let's try to add it explicitly

Shell
 
sample_mflix> db.users.createIndex( { "email": 1 }, { unique: true } )
email_1


Let's attempt to insert a row using FerretDB.

Shell
 
sample_mflix> db.users.findOne({ })
{
  _id: ObjectId("59b99db4cfa9a34dcd7885b6"),
  name: 'Ned Stark',
  email: 'sean_bean@gameofthron.es',
  password: '$2b$12$UREFwsRUoyF0CRqGNK0LzO0HM/jLhgUCNNIJ9RJAqMUQ74crlJ1Vu'
}


We're going to add another record with the same email address

Shell
 
sample_mflix> db.users.insertOne({name: 'Stark Sr.', email: 'sean_bean@gameofthron.es' })
{
  acknowledged: true,
  insertedId: ObjectId("631f99c3cc8ad2b643a9f459")
}


Insert was accepted

I will give MongoDB the benefit of the doubt that their implementation of unique constraints does indeed work. I think the issue here lies with FerretDB.

Let's enforce the constraint on the CockroachDB side:

We will need to create a computed column on the email field in the JSONB field, we're also specifying UNIQUE constraint as part of the command.

MySQL
 
ALTER TABLE sample_mflix.users_5e7cc513 ADD COLUMN email STRING NOT NULL UNIQUE AS ((_jsonb->>'email')::STRING) VIRTUAL;


If you've added Ned Stark's email address or restored the users' collection twice as I've suggested, CockroachDB will detect the unique constraint violation.

Shell
 
ERROR:  failed to ingest index entries during backfill: duplicate key value violates unique constraint "users_5e7cc513_email_key"
DETAIL:  Key (email)=('sean_bean@gameofthron.es') already exists.


You have a choice to truncate the table or delete all violations manually.

MySQL
 
TRUNCATE TABLE sample_mflix.users_5e7cc513 ;
ALTER TABLE sample_mflix.users_5e7cc513 ADD COLUMN email STRING NOT NULL UNIQUE AS ((_jsonb->>'email')::STRING) VIRTUAL;


The schema now looks like so:

MySQL
 
CREATE TABLE sample_mflix.users_5e7cc513 (                           
         _jsonb JSONB NULL,
         rowid INT8 NOT VISIBLE NOT NULL DEFAULT unique_rowid(),
         email STRING NOT NULL AS (_jsonb->>'email':::STRING) VIRTUAL,
         CONSTRAINT users_5e7cc513_pkey PRIMARY KEY (rowid ASC),
         UNIQUE INDEX users_5e7cc513_email_key (email ASC)
 ) LOCALITY REGIONAL BY TABLE IN PRIMARY REGION;


At this point, we can attempt to restore the collection again, this time around, CockroachDB will enforce the uniqueness on the email field.

Plain Text
 
2022-09-14T12:43:59.689-0400    finished restoring sample_mflix.users (185 documents, 0 failures)
2022-09-14T12:43:59.715-0400    restoring indexes for collection sample_mflix.users from metadata
2022-09-14T12:43:59.715-0400    index: &idx.IndexDocument{Options:primitive.M{"name":"email_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"email", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2022-09-14T12:43:59.718-0400    185 document(s) restored successfully. 0 document(s) failed to restore.


Let's attempt to insert a record using mongosh

SQL
 
db.users.insertOne({name: 'Stark Sr.', email: 'sean_bean@gameofthron.es' })
Shell
 
MongoServerError: [pool.go:283 pgdb.(*Pool).InTransaction] [msg_insert.go:108 pg.(*Handler).insert.func1] [insert.go:57 pgdb.InsertDocument] ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key" (SQLSTATE 23505)


We have a violation! But just to be 100% sure, let's attempt to restore the table again

Shell
 
2022-09-12T16:56:55.884-0400    Failed: sample_mflix.users: error restoring from archive 'sampledata.archive': (InternalError) [pool.go:283 pgdb.(*Pool).InTransaction] [msg_insert.go:108 pg.(*Handler).insert.func1] [insert.go:57 pgdb.InsertDocument] ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key" (SQLSTATE 23505)
2022-09-12T16:56:55.884-0400    0 document(s) restored successfully. 0 document(s) failed to restore.


Considerations

We can stop here but I would like to make this concept stick as there are significant benefits to having a multi-region database with strong global consistency. The unique constraint is enforced immediately upon creation across all of the regions, availability zones and nodes. If we were to access the database from say aws-us-east-2, we'd get the same violation. Let's tear down the FerretDB compose environment and create three new mongosh clients, each accessing the regional (aws-us-east-1, aws-us-east-2, aws-us-west-2) cluster endpoints.

YAML
 
version: "3"
services:

  ferretdb-us-east-1:
    image: ghcr.io/ferretdb/ferretdb:latest
    hostname: 'ferretdb-us-east-1'
    container_name: 'ferretdb-us-east-1'
    restart: 'on-failure'
    command:
      [
        '-listen-addr=:27017',
        ## Dedicated multiregion cluster
        '-postgresql-url=postgresql://artem:password@artem-mr-7xw.aws-us-east-1.cockroachlabs.cloud:26257/ferretdb?sslmode=verify-full&sslrootcert=/certs/artem-mr-ca.crt'
      ]
    ports:
      - 27017:27017
    volumes:
       - /Users/artem/.postgresql/root.crt:/certs/root.crt
       - /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt:/certs/artem-mr-ca.crt

  ferretdb-us-east-2:
    image: ghcr.io/ferretdb/ferretdb:latest
    hostname: 'ferretdb-us-east-2'
    container_name: 'ferretdb-us-east-2'
    restart: 'on-failure'
    command:
      [
        '-listen-addr=:27017',
        ## Dedicated multiregion cluster
        '-postgresql-url=postgresql://artem:password@artem-mr-7xw.aws-us-east-2.cockroachlabs.cloud:26257/ferretdb?sslmode=verify-full&sslrootcert=/certs/artem-mr-ca.crt'
      ]
    ports:
      - 27019:27017
    volumes:
       - /Users/artem/.postgresql/root.crt:/certs/root.crt
       - /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt:/certs/artem-mr-ca.crt

  ferretdb-us-west-2:
    image: ghcr.io/ferretdb/ferretdb:latest
    hostname: 'ferretdb-us-west-2'
    container_name: 'ferretdb-us-west-2'
    restart: 'on-failure'
    command:
      [
        '-listen-addr=:27017',
        ## Dedicated multiregion cluster
        '-postgresql-url=postgresql://artem:password@artem-mr-7xw.aws-us-west-2.cockroachlabs.cloud:26257/ferretdb?sslmode=verify-full&sslrootcert=/certs/artem-mr-ca.crt'
      ]
    ports:
      - 27021:27017
    volumes:
       - /Users/artem/.postgresql/root.crt:/certs/root.crt
       - /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt:/certs/artem-mr-ca.crt 


I've created services ferretdb-us-east-1, ferretdb-us-east-2 and ferretdb-us-west-2. The connection strings coincide with the regional endpoints, exposing the host ports on 27017, 27019 and 27021 respectively. Feel free to validate the -postgresql-url for each of the services.

Shell
 
docker compose up -d
docker compose ps
Shell
 
NAME                 COMMAND                  SERVICE              STATUS              PORTS
ferretdb-us-east-1   "/ferretdb -listen-a…"   ferretdb-us-east-1   running             0.0.0.0:27017->27017/tcp
ferretdb-us-east-2   "/ferretdb -listen-a…"   ferretdb-us-east-2   running             0.0.0.0:27019->27017/tcp
ferretdb-us-west-2   "/ferretdb -listen-a…"   ferretdb-us-west-2   running             0.0.0.0:27021->27017/tcp


I'm not familiar with Mongo CLI enough to demonstrate that each endpoint is accessing the regional instance, I will show it in SQL shortly. In the meantime, let's attempt to insert a record

Access the mongosh pointing to the aws-us-west-2 endpoint

Shell
 
mongosh mongodb://localhost:27021


Insert a record

Shell
 
sample_mflix> db.users.insertOne({name: 'Stark Sr.', email: 'sean_bean@gameofthron.es' })
MongoServerError: [pool.go:283 pgdb.(*Pool).InTransaction] [msg_insert.go:108 pg.(*Handler).insert.func1] [insert.go:57 pgdb.InsertDocument] ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key" (SQLSTATE 23505)


It failed and proves that CockroachDB indexes are consistent across geographies. You may be asking why am I working so hard to make this point. I was reading MongoDB documentation and found the following note:

For replica sets and sharded clusters, using a rolling procedure to create a unique index requires that you stop all writes to the collection during the procedure. If you cannot stop all writes to the collection during the procedure, do not use the rolling procedure. Instead, build your unique index on the collection by:

issuing db.collection.createIndex() on the primary for a replica set, or

issuing db.collection.createIndex() on the mongos for a sharded cluster.

Please correct me if I'm misinterpreting the note, but it says you have to stop writing to the collection when you create an index in MongoDB. Additionally, if stopping writes is not an option, you have to create an index on the mongos which as I understand are follower replicas in Mongo speak. That is absolutely not the case with CockroachDB and goes against the CockroachDB consistency guarantees. The unique index is global and consistent across all of the nodes, availability zones and regions. You create it once and you don't stop the writes to the table. That said, let me show you this example in SQL.

Let me connect to the aws-us-west-2 region and intentionally attempt to violate the constraint

Shell
 
cockroach sql --url "postgresql://artem:password@artem-mr-7xw.aws-us-west-2.cockroachlabs.cloud:26257/ferretdb2?sslmode=verify-full&sslrootcert=$HOME/Library/CockroachCloud/certs/artem-mr-ca.crt"


Let's confirm the region

Shell
 
select gateway_region();
Shell
 
 gateway_region 
----------------
 aws-us-west-2


Let's see the network round trip time from my host

Shell
 
select 1;
Plain Text
 
Time: 69ms total (execution 1ms / network 68ms)


That seems accurate, there are about 70ms RTT from where I am in New Jersey.

Let's attempt to insert a record

MySQL
 
INSERT INTO sample_mflix.users_5e7cc513 ("_jsonb") VALUES (
    '{"email": "sean_bean@gameofthron.es"}'
  );
MySQL
 
ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key"
SQLSTATE: 23505
DETAIL: Key (email)=('sean_bean@gameofthron.es') already exists.
CONSTRAINT: users_5e7cc513_email_key


Conclusion

This wraps up our unique constraint experiments with the MongoDB collections. I admit some of the capabilities have not been tested on the MongoDB side because we're proxying via FerretDB. However, I hope I've given you enough evidence having a proper relational database with strict schema validation protects users from data quality issues. I'm looking forward to new experiments with FerretDB!

Previous articles

Using CockroachDB for FerretDB Backend
Migrating MongoDB Collections to CockroachDB With FerretDB

CockroachDB MongoDB

Published at DZone with permission of Artem Ervits. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Loading XML into MongoDB
  • Mastering Full-Stack Development: A Comprehensive Beginner’s Guide to the MERN Stack
  • How To Perform Data Migration in MongoDB Using Node.js
  • Leveraging AI and Vector Search in Azure Cosmos DB for MongoDB vCore

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!