Using (and abusing) MongoDB ObjectIds as created-on Timestamps
Join the DZone community and get the full member experience.
Join For FreeOne of my favorite MongoDB tricks is the ability to use an ObjectId (the default type for MongoDB’s _id
primary key) as a timestamp for when a document was created. Here’s how it works:
>>> import pymongo >>> db = pymongo.Connection().test >>> db.test.insert({'hello': 'world'}) ObjectId('4f202e64e6fb1b56ff000000') >>> doc = db.test.find_one() >>> doc['_id'].generation_time datetime.datetime(2012, 1, 25, 16, 31, 32, tzinfo=<...>)
We’re inserting a single document and then immediately querying for it. The generation_time
property of the automatically generated _id
gives us a datetime representing when that ObjectId was generated
(precise to the second). This is great for those times when you would’ve
otherwise added an extra “created_on” field with just a timestamp.
Going in Reverse
PyMongo’s ObjectId class also has a method that let’s us generate an
ObjectId from a datetime, for use in querying (the other drivers have
this too). Let’s insert another document and give it a try:
>>> import pprint >>> import datetime >>> from bson.objectid import ObjectId >>> db.test.insert({'hello': 'a little later'}) ObjectId('4f2030d9e6fb1b56ff000001') >>> pprint.pprint(list(db.test.find())) [{u'_id': ObjectId('4f202e64e6fb1b56ff000000'), u'hello': u'world'}, {u'_id': ObjectId('4f2030d9e6fb1b56ff000001'), u'hello': u'a little later'}] >>> timestamp = datetime.datetime(2012, 1, 25, 16, 35) >>> pprint.pprint(list(db.test.find({'_id': {'$gt': ObjectId.from_datetime(timestamp)}}))) [{u'_id': ObjectId('4f2030d9e6fb1b56ff000001'), u'hello': u'a little later'}]
The call to ObjectID.from_datetime()
is what let’s us create a special ObjectId just for querying. If you look at the API docs you’ll see a note that I wrote a long time ago about when this method is safe to use. That leads us into our next section:
Abusing ObjectIds
At Fiesta we use ObjectIds to get the timestamps to display in our new archiving UI. Recently we had to import some existing archives for a group that was migrating to Fiesta. This presents a problem: when we import the archives we are creating new documents with new ObjectIds, but we want them to have timestamps that make them look much older. There are a couple of ways we could’ve approached this problem. I’ll start with what we did and then discuss why it’s wrong and what we probably should’ve done instead :).
We wrote some code to generate ObjectIds with timestamps that
occurred in the past, and manually generated _id values to match the
messages we were importing. Here’s the code:
import calendar import struct from bson.objectid import ObjectId # Current ObjectId increment INC = 0 def generate_objectid(generation_time): ''' This is unsafe. We generate fake ObjectIds. Set the five (machine id/PID) bytes to '\xFA' so we can at least recognize OIDs we generated. We don't lock around the INC, so this method isn't re-entrant. ''' global INC # Timestamp oid = struct.pack(">i", int(calendar.timegm(generation_time.timetuple()))) # Machine ID / PID oid += "\xFA" * 5 # Increment oid += struct.pack(">i", INC)[1:4] INC = (INC + 1) % 0xFFFFFF return ObjectId(oid)
We couldn’t use the above ObjectId.from_datetime()
because, as noted in the docs, it’s unsafe for use in anything but queries. The method above is marginally more
safe by virtue of using an actual increment and a canary for the
Machine ID & PID (from_datetime() uses all \x00s). But it’s still
unsafe - if we need to do another import we need to be sure not to use
the same canary. We also need to be sure that the canary never matches
any of our actual Machine ID / PID bytes.
What we should’ve done
What we probably should do is add a “created_on” field with a regular datetime timestamp for messages that are being imported. When we go to display a message use created_on if it exists and fall-back to the _id otherwise. That way we’re never resorting to improperly generated ObjectIds, but we still get the benefit of built-in timestamps when we can. I figured I’d do this post in case anybody comes across the same problem, and as a neat way of exposing some of the internals of ObjectIds.
Source: http://blog.fiesta.cc/post/16470048697/using-and-abusing-mongodb-objectids-as-created-on
Opinions expressed by DZone contributors are their own.
Comments