Experimenting With Unique Constraints in CockroachDB, MongoDB, and FerretDB
This is an unexpected third part of an interesting experiment with FerretDB and CockroachDB. This time, we're going to expand on the previous article.
Join the DZone community and get the full member experience.
Join For FreeMotivation
Once I completed my first two articles, I realized there are a lot of possibilities exposed by proxying MongoDB collections through FerretDB backed by CockroachDB. CockroachDB has unique data domiciling capabilities available through multi-region abstractions, inverted and partial indexes, computed columns and of course strong consistency. Today, we're going to discuss unique constraints in MongoDB and CockroachDB.
High-Level Steps
- Start a 9-node multi-region cluster (CockroachDB Dedicated)
- Start FerretDB (Docker)
- Unique Indexes
- Considerations
- Conclusion
Step-by-Step Instructions
Start a 9-Node Multi-Region Cluster (Cockroachdb Dedicated)
I am going to use the same CockroachDB Dedicated cluster from the previous article. Please refer to the previous article for the detailed steps. You can get a 30-day trial of CockroachDB Dedicated following this link.
Start FerretDB (Docker)
I'll be using the same compose file from the previous article, however, as we will be discussing multi-region, I will make changes to the compose file and discuss the steps below.
Unique Indexes
FerretDB is under development and many features may still be unavailable. In the previous article, I migrated a MongoDB sample_mflix
dataset which relies on unique indexes for some of the collections. As of this writing, I've not been able to validate unique constraints work in FerretDB. I've concluded that it wasn't implemented yet. It inhibits some of the capabilities we'd like to discuss and I will demonstrate how instead we can leverage CockroachDB for globally strong consistent unique indexes.
Restore the sample_mflix.users
collection if you've not done so previously
mongorestore --archive=sampledata.archive --nsInclude=sample_mflix.users --numInsertionWorkersPerCollection=100
2022-09-12T16:29:24.266-0400 sample_mflix.users 28.9KB
2022-09-12T16:29:26.156-0400 sample_mflix.users 28.9KB
2022-09-12T16:29:26.156-0400 finished restoring sample_mflix.users (185 documents, 0 failures)
2022-09-12T16:29:26.200-0400 restoring indexes for collection sample_mflix.users from metadata
2022-09-12T16:29:26.200-0400 index: &idx.IndexDocument{Options:primitive.M{"name":"email_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"email", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2022-09-12T16:29:26.203-0400 185 document(s) restored successfully. 0 document(s) failed to restore.
Notice a unique constraint was restored along with the data, looking at the dataset description, indeed there's a unique constraint on the email
field. The schema in CockroachDB is not aware of the constraint:
CREATE TABLE sample_mflix.users_5e7cc513 (
_jsonb JSONB NULL,
rowid INT8 NOT VISIBLE NOT NULL DEFAULT unique_rowid(),
CONSTRAINT users_5e7cc513_pkey PRIMARY KEY (rowid ASC)
);
If we attempted to restore this file again, the constraint would not fire, in fact, it would append to the existing rows.
Originally we restored 185 records, if we rerun the restore, we should see 370 records
2022-09-12T16:38:01.207-0400 sample_mflix.users 28.9KB
2022-09-12T16:38:01.207-0400 finished restoring sample_mflix.users (185 documents, 0 failures)
2022-09-12T16:38:01.247-0400 restoring indexes for collection sample_mflix.users from metadata
2022-09-12T16:38:01.247-0400 index: &idx.IndexDocument{Options:primitive.M{"name":"email_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"email", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2022-09-12T16:38:01.251-0400 185 document(s) restored successfully. 0 document(s) failed to restore.
ferretdb=> select count(*) from sample_mflix.users_5e7cc513 ;
count
-------
370
It's obvious the unique constraint is not enforced. What does FerretDB show:
sample_mflix> db.getCollectionInfos( )
[ { name: 'users', type: 'collection' } ]
sample_mflix> db.users.getIndexes()
MongoServerError: no such command: 'listIndexes'
Let's try to add it explicitly
sample_mflix> db.users.createIndex( { "email": 1 }, { unique: true } )
email_1
Let's attempt to insert a row using FerretDB.
sample_mflix> db.users.findOne({ })
{
_id: ObjectId("59b99db4cfa9a34dcd7885b6"),
name: 'Ned Stark',
email: 'sean_bean@gameofthron.es',
password: '$2b$12$UREFwsRUoyF0CRqGNK0LzO0HM/jLhgUCNNIJ9RJAqMUQ74crlJ1Vu'
}
We're going to add another record with the same email address
sample_mflix> db.users.insertOne({name: 'Stark Sr.', email: 'sean_bean@gameofthron.es' })
{
acknowledged: true,
insertedId: ObjectId("631f99c3cc8ad2b643a9f459")
}
Insert was accepted
I will give MongoDB the benefit of the doubt that their implementation of unique constraints does indeed work. I think the issue here lies with FerretDB.
Let's enforce the constraint on the CockroachDB side:
We will need to create a computed column on the email field in the JSONB field, we're also specifying UNIQUE
constraint as part of the command.
ALTER TABLE sample_mflix.users_5e7cc513 ADD COLUMN email STRING NOT NULL UNIQUE AS ((_jsonb->>'email')::STRING) VIRTUAL;
If you've added Ned Stark's email address or restored the users' collection twice as I've suggested, CockroachDB will detect the unique constraint violation.
ERROR: failed to ingest index entries during backfill: duplicate key value violates unique constraint "users_5e7cc513_email_key"
DETAIL: Key (email)=('sean_bean@gameofthron.es') already exists.
You have a choice to truncate the table or delete all violations manually.
TRUNCATE TABLE sample_mflix.users_5e7cc513 ;
ALTER TABLE sample_mflix.users_5e7cc513 ADD COLUMN email STRING NOT NULL UNIQUE AS ((_jsonb->>'email')::STRING) VIRTUAL;
The schema now looks like so:
CREATE TABLE sample_mflix.users_5e7cc513 (
_jsonb JSONB NULL,
rowid INT8 NOT VISIBLE NOT NULL DEFAULT unique_rowid(),
email STRING NOT NULL AS (_jsonb->>'email':::STRING) VIRTUAL,
CONSTRAINT users_5e7cc513_pkey PRIMARY KEY (rowid ASC),
UNIQUE INDEX users_5e7cc513_email_key (email ASC)
) LOCALITY REGIONAL BY TABLE IN PRIMARY REGION;
At this point, we can attempt to restore the collection again, this time around, CockroachDB will enforce the uniqueness on the email
field.
2022-09-14T12:43:59.689-0400 finished restoring sample_mflix.users (185 documents, 0 failures)
2022-09-14T12:43:59.715-0400 restoring indexes for collection sample_mflix.users from metadata
2022-09-14T12:43:59.715-0400 index: &idx.IndexDocument{Options:primitive.M{"name":"email_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"email", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2022-09-14T12:43:59.718-0400 185 document(s) restored successfully. 0 document(s) failed to restore.
Let's attempt to insert a record using mongosh
db.users.insertOne({name: 'Stark Sr.', email: 'sean_bean@gameofthron.es' })
MongoServerError: [pool.go:283 pgdb.(*Pool).InTransaction] [msg_insert.go:108 pg.(*Handler).insert.func1] [insert.go:57 pgdb.InsertDocument] ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key" (SQLSTATE 23505)
We have a violation! But just to be 100% sure, let's attempt to restore the table again
2022-09-12T16:56:55.884-0400 Failed: sample_mflix.users: error restoring from archive 'sampledata.archive': (InternalError) [pool.go:283 pgdb.(*Pool).InTransaction] [msg_insert.go:108 pg.(*Handler).insert.func1] [insert.go:57 pgdb.InsertDocument] ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key" (SQLSTATE 23505)
2022-09-12T16:56:55.884-0400 0 document(s) restored successfully. 0 document(s) failed to restore.
Considerations
We can stop here but I would like to make this concept stick as there are significant benefits to having a multi-region database with strong global consistency. The unique constraint is enforced immediately upon creation across all of the regions, availability zones and nodes. If we were to access the database from say aws-us-east-2
, we'd get the same violation. Let's tear down the FerretDB compose environment and create three new mongosh
clients, each accessing the regional (aws-us-east-1
, aws-us-east-2
, aws-us-west-2
) cluster endpoints.
version: "3"
services:
ferretdb-us-east-1:
image: ghcr.io/ferretdb/ferretdb:latest
hostname: 'ferretdb-us-east-1'
container_name: 'ferretdb-us-east-1'
restart: 'on-failure'
command:
[
'-listen-addr=:27017',
## Dedicated multiregion cluster
'-postgresql-url=postgresql://artem:password@artem-mr-7xw.aws-us-east-1.cockroachlabs.cloud:26257/ferretdb?sslmode=verify-full&sslrootcert=/certs/artem-mr-ca.crt'
]
ports:
- 27017:27017
volumes:
- /Users/artem/.postgresql/root.crt:/certs/root.crt
- /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt:/certs/artem-mr-ca.crt
ferretdb-us-east-2:
image: ghcr.io/ferretdb/ferretdb:latest
hostname: 'ferretdb-us-east-2'
container_name: 'ferretdb-us-east-2'
restart: 'on-failure'
command:
[
'-listen-addr=:27017',
## Dedicated multiregion cluster
'-postgresql-url=postgresql://artem:password@artem-mr-7xw.aws-us-east-2.cockroachlabs.cloud:26257/ferretdb?sslmode=verify-full&sslrootcert=/certs/artem-mr-ca.crt'
]
ports:
- 27019:27017
volumes:
- /Users/artem/.postgresql/root.crt:/certs/root.crt
- /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt:/certs/artem-mr-ca.crt
ferretdb-us-west-2:
image: ghcr.io/ferretdb/ferretdb:latest
hostname: 'ferretdb-us-west-2'
container_name: 'ferretdb-us-west-2'
restart: 'on-failure'
command:
[
'-listen-addr=:27017',
## Dedicated multiregion cluster
'-postgresql-url=postgresql://artem:password@artem-mr-7xw.aws-us-west-2.cockroachlabs.cloud:26257/ferretdb?sslmode=verify-full&sslrootcert=/certs/artem-mr-ca.crt'
]
ports:
- 27021:27017
volumes:
- /Users/artem/.postgresql/root.crt:/certs/root.crt
- /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt:/certs/artem-mr-ca.crt
I've created services ferretdb-us-east-1
, ferretdb-us-east-2
and ferretdb-us-west-2
. The connection strings coincide with the regional endpoints, exposing the host ports on 27017
, 27019
and 27021
respectively. Feel free to validate the -postgresql-url
for each of the services.
docker compose up -d
docker compose ps
NAME COMMAND SERVICE STATUS PORTS
ferretdb-us-east-1 "/ferretdb -listen-a…" ferretdb-us-east-1 running 0.0.0.0:27017->27017/tcp
ferretdb-us-east-2 "/ferretdb -listen-a…" ferretdb-us-east-2 running 0.0.0.0:27019->27017/tcp
ferretdb-us-west-2 "/ferretdb -listen-a…" ferretdb-us-west-2 running 0.0.0.0:27021->27017/tcp
I'm not familiar with Mongo CLI enough to demonstrate that each endpoint is accessing the regional instance, I will show it in SQL shortly. In the meantime, let's attempt to insert a record
Access the mongosh
pointing to the aws-us-west-2
endpoint
mongosh mongodb://localhost:27021
Insert a record
sample_mflix> db.users.insertOne({name: 'Stark Sr.', email: 'sean_bean@gameofthron.es' })
MongoServerError: [pool.go:283 pgdb.(*Pool).InTransaction] [msg_insert.go:108 pg.(*Handler).insert.func1] [insert.go:57 pgdb.InsertDocument] ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key" (SQLSTATE 23505)
It failed and proves that CockroachDB indexes are consistent across geographies. You may be asking why am I working so hard to make this point. I was reading MongoDB documentation and found the following note:
For replica sets and sharded clusters, using a rolling procedure to create a unique index requires that you stop all writes to the collection during the procedure. If you cannot stop all writes to the collection during the procedure, do not use the rolling procedure. Instead, build your unique index on the collection by:
issuing db.collection.createIndex()
on the primary for a replica set, or
issuing db.collection.createIndex()
on the mongos for a sharded cluster.
Please correct me if I'm misinterpreting the note, but it says you have to stop writing to the collection when you create an index in MongoDB. Additionally, if stopping writes is not an option, you have to create an index on the mongos
which as I understand are follower replicas in Mongo speak. That is absolutely not the case with CockroachDB and goes against the CockroachDB consistency guarantees. The unique index is global and consistent across all of the nodes, availability zones and regions. You create it once and you don't stop the writes to the table. That said, let me show you this example in SQL.
Let me connect to the aws-us-west-2
region and intentionally attempt to violate the constraint
cockroach sql --url "postgresql://artem:password@artem-mr-7xw.aws-us-west-2.cockroachlabs.cloud:26257/ferretdb2?sslmode=verify-full&sslrootcert=$HOME/Library/CockroachCloud/certs/artem-mr-ca.crt"
Let's confirm the region
select gateway_region();
gateway_region
----------------
aws-us-west-2
Let's see the network round trip time from my host
select 1;
Time: 69ms total (execution 1ms / network 68ms)
That seems accurate, there are about 70ms RTT from where I am in New Jersey.
Let's attempt to insert a record
INSERT INTO sample_mflix.users_5e7cc513 ("_jsonb") VALUES (
'{"email": "sean_bean@gameofthron.es"}'
);
ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key"
SQLSTATE: 23505
DETAIL: Key (email)=('sean_bean@gameofthron.es') already exists.
CONSTRAINT: users_5e7cc513_email_key
Conclusion
This wraps up our unique constraint experiments with the MongoDB collections. I admit some of the capabilities have not been tested on the MongoDB side because we're proxying via FerretDB. However, I hope I've given you enough evidence having a proper relational database with strict schema validation protects users from data quality issues. I'm looking forward to new experiments with FerretDB!
Previous articles
Using CockroachDB for FerretDB Backend
Migrating MongoDB Collections to CockroachDB With FerretDB
Published at DZone with permission of Artem Ervits. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments