Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

PyMongo and Key Order in Subdocuments

DZone's Guide to

PyMongo and Key Order in Subdocuments

Or, "Why does my query work in the shell but not PyMongo?" For one, PyMongo represents BSON documents as Python dicts by default.

Free Resource

Whether you work in SQL Server Management Studio or Visual Studio, Redgate tools integrate with your existing infrastructure, enabling you to align DevOps for your applications with DevOps for your SQL Server databases. Discover true Database DevOps, brought to you in partnership with Redgate.

Or, "Why does my query work in the shell but not PyMongo?"

Variations on this question account for a large portion of the Stack Overflow questions I see about PyMongo, so let me explain once for all.

MongoDB stores documents in a binary format called BSON. Key-value pairs in a BSON document can have any order (except that _id is always first). The mongo shell preserves key order when reading and writing data. Observe that "b" comes before "a" when we create the document and when it is displayed:

> // mongo shell.
> db.collection.insert( {
...     "_id" : 1,
...     "subdocument" : { "b" : 1, "a" : 1 }
... } )
WriteResult({ "nInserted" : 1 })
> db.collection.find()
{ "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } }

PyMongo represents BSON documents as Python dicts by default, and the order of keys in dicts is not defined. That is, a dict declared with the "a" key first is the same, to Python, as one with "b" first:

>>> print {'a': 1.0, 'b': 1.0}
{'a': 1.0, 'b': 1.0}
>>> print {'b': 1.0, 'a': 1.0}
{'a': 1.0, 'b': 1.0}

Therefore, Python dicts are not guaranteed to show keys in the order they are stored in BSON. Here, "a" is shown before "b":

>>> print collection.find_one()
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

To preserve order when reading BSON, use the SON class, which is a dict that remembers its key order. First, get a handle to the collection, configured to use SON instead of dict. In PyMongo 3.0 do this like:

>>> from bson import CodecOptions, SON
>>> opts = CodecOptions(as_class=SON)
>>> opts
CodecOptions(as_class=<class 'bson.son.SON'>,
             tz_aware=False,
             uuid_representation=PYTHON_LEGACY)
>>> collection_son = collection.with_options(codec_options=opts)

Now, documents and subdocuments in query results are represented with SON objects:

>>> print collection_son.find_one()
SON([(u'_id', 1.0), (u'subdocument', SON([(u'b', 1.0), (u'a', 1.0)]))])

The subdocument's actual storage layout is now visible: "b" is before "a".

Because a dict's key order is not defined, you cannot predict how it will be serialized to BSON. But MongoDB considers subdocuments equal only if their keys have the same order. So if you use a dict to query on a subdocument it may not match:

>>> collection.find_one({'subdocument': {'a': 1.0, 'b': 1.0}}) is None
True

Swapping the key order in your query makes no difference:

>>> collection.find_one({'subdocument': {'b': 1.0, 'a': 1.0}}) is None
True

... because, as we saw above, Python considers the two dicts the same.

There are two solutions. First, you can match the subdocument field-by-field:

>>> collection.find_one({'subdocument.a': 1.0,
...                      'subdocument.b': 1.0})
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The query matches any subdocument with an "a" of 1.0 and a "b" of 1.0, regardless of the order you specify them in Python or the order they are stored in BSON. Additionally, this query now matches subdocuments with additional keys besides "a" and "b", whereas the previous query required an exact match.

The second solution is to use a SON to specify the key order:

>>> query = {'subdocument': SON([('b', 1.0), ('a', 1.0)])}
>>> collection.find_one(query)
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The key order you use when you create a SON is preserved when it is serialized to BSON and used as a query. Thus you can create a subdocument that exactly matches the subdocument in the collection.

For more info, see the MongoDB Manual entry on subdocument matching.

It’s easier than you think to extend DevOps practices to SQL Server with Redgate tools. Discover how to introduce true Database DevOps, brought to you in partnership with Redgate

Topics:
python ,nosql ,mongodb ,pymongo ,subdocuments ,database

Published at DZone with permission of A. Jesse Jiryu Davis, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}