DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Databases
  4. How Python MongoDB Toolkit Ming Can Ease Schema Maintenance

How Python MongoDB Toolkit Ming Can Ease Schema Maintenance

Rick Copeland user avatar by
Rick Copeland
·
Jun. 11, 12 · Interview
Like (0)
Save
Tweet
Share
6.27K Views

Join the DZone community and get the full member experience.

Join For Free

Schema Maintenance with Ming and MongoDB

Continuing on in my series on MongoDB and Python, this article introduces the Python MongoDB toolkit Ming and what it can do to simplify your MongoDB code and ease maintenance. If you're just getting started with MongoDB, you might want to read the previous articles in the series first:

  • Getting Started with MongoDB and Python
  • Moving Along With PyMongo
  • GridFS: The MongoDB Filesystem
  • Aggregation in MongoDB (Part 1)
  • MongoDB's New Aggregation Framework

And now that you're all caught up, let's jump right in with Ming....

Why Ming?

If you've come to MongoDB from the world of relational databases, you have probably been struck by just how easy everything is: no big object/relational mapper needed, no new query language to learn (well, maybe a little, but we'll gloss over that for now), everything is just Python dictionaries, and it's so, so fast! While this is all true to some extent, one of the big things you give up with MongoDB is structure.

MongoDB is sometimes referred to as a schema-free database. (This is not technically true; I find it more useful to think of MongoDB as having dynamically typed documents. The collection doesn't tell you anything about the type of documents it contains, but each individual document can be inspected.) While this can be nice, as it's easy to evolve your schema quickly in development, it's easy to get yourself in trouble the first time your application tries to query by a field that only exists in some of your documents.

The fact of the matter is that even if the database cares nothing about your schema, your application does, and if you play too fast and lose with document structure, it will come back to haunt you in the end. The main reason Ming was created at SourceForge was to deal with just this problem. We wanted a (thin) layer on top of pymongo that would do a couple of things for us:

  • Make sure that we don't put malformed data into the database
  • Try to 'fix' malformed data coming back from the database

So, without belaboring the point of its existence, let's jump into Ming.

Defining your schema

When using Ming, the first thing you need to do is to tell it what your documents look like. For this, Ming provides the collection function.

from datetime import datetime

from ming import collection, Field, Session
from ming import schema as S

session = Session()
MyDoc = collection(
    'user', session,
    Field('_id', S.ObjectId),
    Field('username', str),
    Field('created', datetime, if_missing=datetime.utcnow),
    ...)

There are a few of things to note above:

  • The MongoDB collection name is passed as the first argument to collection
  • The Session object is used to abstract away the pymongo connection. We will see how to configure it below.
  • Each field in our schema gets its own Field definition. Fields contain a name, a schema item (S.ObjectId, str, and datetime in this example), and optional arguments that affect the field.
  • The special if_missing keyword argument allows you to supply default arguments which will be 'filled in' by Ming. If you pass a function, as above, the function will be called to generate a default value.

Schema items bear a bit more explanation. Ming internally always works with objects from the ming.schema module, but it also provides shortcuts to ease schema definitions. The translation between shortcut and ming.schema.SchemaItem appears below:

shorthand SchemaItem Notes
None Anything  
int Int  
str String Unicode
float Float  
bool Bool  
datetime DateTime  
[] Array(Anything()) Any valid array
[int] Array(Int())  
{str:None} Object({str:None}) Any valid object
{"a": int} Object({"a": int}) Embedded schema

Note above that we can create complex schemas using Ming. A blog post might have the following definition, for example:

BlogPost = collection(
   'blog.post', session,
   Field('_id', S.ObjectId),
   Field('posted', datetime, if_missing=datetime.utcnow),
   Field('title', str),
   Field('author', dict(
       username=str,
       display_name=str)),
   Field('text', str),
   Field('comments', [
       dict(
           author=dict(
               username=str,
               display_name=str),
           posted=S.DateTime(if_missing=datetime.utcnow),
           text=str) ]))

Note in the schema above that author is an embedded document, and comments is an embedded array of documents.

Indexing

If we expected to do a lot of queries on user.username, we could add an index simply by updating the code above to read:

   ...
    Field('username', str, index=True)
    ...

Creating the indexes in the schema like this has the nice property that Ming will ensure that those indexes exist the first time it touches the database. We can also set a unique index on a field by using the unique optional argument:

    ...
    Field('username', str, unique=True)
    ...

Ming also support specifying compound indexes by using the Index object in the collection definition. Suppose we wished to keep a separate list of users, scoped by client_id. In this case, the schema might look more like the following:

from datetime import datetime

from ming import collection, Field, Index, Session
from ming import schema as S

session = Session()
MyDoc = collection(
    'user', session,
    Field('_id', S.ObjectId),
    Field('client_id', S.ObjectId, if_missing=None),
    Field('username', str),
    Field('created', datetime, if_missing=datetime.utcnow),
    Index('client_id', 'username', unique=True),
    ...)

In the example above, the index would be created as follows:

db.user.ensure_index([('client_id', 1), ('username', 1)], unique=True)

By default, each key in an index created by Ming is sorted in ascending order. If you want to change this, you can explicitly specify the sort order for the index:

    ...
    Index(('client_id', -1), ('username', 1), unique=True)
    ...

Connection and configuration

Once we've defined our schema, we can use it by binding the session to the appropriate MongoDB database using ming.datastore:

from ming import datastore

session.bind = datastore.DataStore(
    'mongodb://localhost:27017', database='test')

More typically, we will create our session as a named session and bind it somewhere else in our application (perhaps in our startup script):

session = ming.Session.by_name('test)

...

ming.config.configure_from_nested_dict(dict(
    test=dict(
        master='mongodb://localhost:27017', 
        database='test')
    ))

By using named schemas, you can decouple your schema definition code from the actual configuration of your database connection. This is often useful when you will be reading connection information from a configuration file, for instance.

Querying and updating

To show how Ming supports querying and updating, let's go back to our simple User schema above:

from datetime import datetime

from ming import collection, Field, Index, Session
from ming import schema as S

session = Session()
MyDoc = collection(
    'user', session,
    Field('_id', S.ObjectId),
    Field('client_id', S.ObjectId, if_missing=None),
    Field('username', str),
    Field('created', datetime, if_missing=datetime.utcnow),
    Index('client_id', 'username', unique=True),
    ...)

Now let's insert some data:

>>> import pymongo
>>> conn = pymongo.Connection()
>>> db = conn.test
>>> db.user.insert([
...     dict(username='rick'),
...     dict(username='jenny'),
...     dict(username='mark')])
[ObjectId('4fd24c96fb72f08265000000'), 
 ObjectId('4fd24c96fb72f08265000001'), 
 ObjectId('4fd24c96fb72f08265000002')]

To get the data back out, we simply use the collection's manager property m:

>>> MyDoc.m.find().all()
[{'username': u'rick', 
  '_id': ObjectId('4fd24c96fb72f08265000000'), 
  'client_id': None, 
  'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522073)}, 
 {'username': u'jenny', 
  '_id': ObjectId('4fd24c96fb72f08265000001'), 
  'client_id': None, 
  'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522195)}, 
 {'username': u'mark', 
  '_id': ObjectId('4fd24c96fb72f08265000002'), 
  'client_id': None, 
  'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522315)}]

Notice how Ming has filled in the values we omitted when creating the user documents. In this case, it's actually filling them in as they are returned from the database. We can drop down to the pymongo layer to see this by using the m.collection property on MyDoc:

>>> list(MyDoc.m.collection.find())
[{u'username': u'rick', 
  u'_id': ObjectId('4fd24c96fb72f08265000000')}, 
 {u'username': u'jenny', 
  u'_id': ObjectId('4fd24c96fb72f08265000001')}, 
 {u'username': u'mark', 
  u'_id': ObjectId('4fd24c96fb72f08265000002')}]

Now let's remove the documents we created and create some using Ming:

>>> MyDoc.m.remove()
>>> 
>>> MyDoc(dict(username='rick')).m.insert()
>>> MyDoc(dict(username='jenny')).m.insert()
>>> MyDoc(dict(username='mark')).m.insert()
>>> 
>>> MyDoc.m.collection.find_one()
{u'username': u'rick', 
 u'_id': ObjectId('4fd24f95fb72f08265000003'), 
 u'client_id': None, 
 u'created': datetime.datetime(2012, 6, 8, 19, 16, 37, 565000)}

Note that when we created the documents using Ming, we see the default values stored in the database.

Another thing to note above is that when we inserted the new documents, we didn't have to specify the table. Ming documents are actually dict subclasses, but they "remember" where they came from. To update a document, all we need to do is to call .m.save() on the document:

>>> doc = MyDoc.m.get(username='rick')
>>> import bson
>>> doc.client_id=bson.ObjectId()
>>> doc.username
u'rick'
>>> doc.client_id
ObjectId('4fd250bdfb72f08265000006')
>>> doc.m.save()

If you'd prefer to use MongoDB's atomic updates, you can use the manager method update_partial:

>>> MyDoc.m.update_partial(
...     dict(username='rick'), 
...     {'$set': { 'client_id': None}})
{u'updatedExisting': True, u'connectionId': 232, 
 u'ok': 1.0, u'err': None, u'n': 1}

More to come

There's a lot more to Ming, which I'll cover in future articles, including data polymorphism, eager and lazy data migration, [gridfs][gridfs] support, and an object-document mapper providing object-relational type capabilities.

So what do you think? Is Ming something that you would use for your projects? Have you chosen one of the other MongoDB mappers? Please let me know in the comments below.

Schema MongoDB Database Python (language) code style Relational database Ease (programming language)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Building a REST API With AWS Gateway and Python
  • gRPC on the Client Side
  • Strategies for Kubernetes Cluster Administrators: Understanding Pod Scheduling
  • Microservices 101: Transactional Outbox and Inbox

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: