DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • A Better Web3 Experience: Account Abstraction From Flow (Part 2)
  • Training a Handwritten Digits Classifier in Pytorch With Apache Cassandra Database
  • Mastering Persistence: Why the Persistence Layer Is Crucial for Modern Java Applications
  • .NET Performance Optimization Techniques for Expert Developers

Trending

  • How To Use ChatGPT API in Python for Your Real-Time Data
  • Monkey-Patching in Java
  • Securing Your Applications With Spring Security
  • How To Optimize Feature Sets With Genetic Algorithms
  1. DZone
  2. Data Engineering
  3. Databases
  4. PyMongo and Key Order in Subdocuments

PyMongo and Key Order in Subdocuments

Or, "Why does my query work in the shell but not PyMongo?" For one, PyMongo represents BSON documents as Python dicts by default.

A. Jesse Jiryu Davis user avatar by
A. Jesse Jiryu Davis
·
Mar. 30, 15 · Interview
Like (0)
Save
Tweet
Share
10.08K Views

Join the DZone community and get the full member experience.

Join For Free

Or, "Why does my query work in the shell but not PyMongo?"

Variations on this question account for a large portion of the Stack Overflow questions I see about PyMongo, so let me explain once for all.

MongoDB stores documents in a binary format called BSON. Key-value pairs in a BSON document can have any order (except that _id is always first). The mongo shell preserves key order when reading and writing data. Observe that "b" comes before "a" when we create the document and when it is displayed:

> // mongo shell.
> db.collection.insert( {
...     "_id" : 1,
...     "subdocument" : { "b" : 1, "a" : 1 }
... } )
WriteResult({ "nInserted" : 1 })
> db.collection.find()
{ "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } }

PyMongo represents BSON documents as Python dicts by default, and the order of keys in dicts is not defined. That is, a dict declared with the "a" key first is the same, to Python, as one with "b" first:

>>> print {'a': 1.0, 'b': 1.0}
{'a': 1.0, 'b': 1.0}
>>> print {'b': 1.0, 'a': 1.0}
{'a': 1.0, 'b': 1.0}

Therefore, Python dicts are not guaranteed to show keys in the order they are stored in BSON. Here, "a" is shown before "b":

>>> print collection.find_one()
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

To preserve order when reading BSON, use the SON class, which is a dict that remembers its key order. First, get a handle to the collection, configured to use SON instead of dict. In PyMongo 3.0 do this like:

>>> from bson import CodecOptions, SON
>>> opts = CodecOptions(as_class=SON)
>>> opts
CodecOptions(as_class=<class 'bson.son.SON'>,
             tz_aware=False,
             uuid_representation=PYTHON_LEGACY)
>>> collection_son = collection.with_options(codec_options=opts)

Now, documents and subdocuments in query results are represented with SON objects:

>>> print collection_son.find_one()
SON([(u'_id', 1.0), (u'subdocument', SON([(u'b', 1.0), (u'a', 1.0)]))])

The subdocument's actual storage layout is now visible: "b" is before "a".

Because a dict's key order is not defined, you cannot predict how it will be serialized to BSON. But MongoDB considers subdocuments equal only if their keys have the same order. So if you use a dict to query on a subdocument it may not match:

>>> collection.find_one({'subdocument': {'a': 1.0, 'b': 1.0}}) is None
True

Swapping the key order in your query makes no difference:

>>> collection.find_one({'subdocument': {'b': 1.0, 'a': 1.0}}) is None
True

... because, as we saw above, Python considers the two dicts the same.

There are two solutions. First, you can match the subdocument field-by-field:

>>> collection.find_one({'subdocument.a': 1.0,
...                      'subdocument.b': 1.0})
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The query matches any subdocument with an "a" of 1.0 and a "b" of 1.0, regardless of the order you specify them in Python or the order they are stored in BSON. Additionally, this query now matches subdocuments with additional keys besides "a" and "b", whereas the previous query required an exact match.

The second solution is to use a SON to specify the key order:

>>> query = {'subdocument': SON([('b', 1.0), ('a', 1.0)])}
>>> collection.find_one(query)
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The key order you use when you create a SON is preserved when it is serialized to BSON and used as a query. Thus you can create a subdocument that exactly matches the subdocument in the collection.

For more info, see the MongoDB Manual entry on subdocument matching.

BSON Database

Published at DZone with permission of A. Jesse Jiryu Davis, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • A Better Web3 Experience: Account Abstraction From Flow (Part 2)
  • Training a Handwritten Digits Classifier in Pytorch With Apache Cassandra Database
  • Mastering Persistence: Why the Persistence Layer Is Crucial for Modern Java Applications
  • .NET Performance Optimization Techniques for Expert Developers

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: