Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

It Seemed Like A Good Idea At The Time: PyMongo's "use_greenlets"

DZone's Guide to

It Seemed Like A Good Idea At The Time: PyMongo's "use_greenlets"

· Java Zone
Free Resource

Build vs Buy a Data Quality Solution: Which is Best for You? Gain insights on a hybrid approach. Download white paper now!

Road

The road to hell is paved with good intentions.

This is the second of a four-part series on regrettable decisions we made when we designed PyMongo. This winter we're preparing PyMongo 3.0, and we have an opportunity to make big changes. I'm putting our regrettable designs to rest, and writing their epitaphs as I go.

Last week I wrote about the first regrettable decision, "start_request". Today I'll tell you the story of the second: PyMongo and Gevent.


The Invention Of "use_greenlets"

As I described in last week's article, I committed my first big changes to connection pooling in PyMongo in March 2012. Once I'd improved PyMongo's connection pool for multi-threaded applications, my boss Bernie asked me to improve PyMongo's compatibility with Gevent. The main problem was, PyMongo wanted to reserve a socket for each thread, but Gevent uses greenlets in place of threads. I didn't know Gevent well, but I forged ahead.

I added a "use_greenlets" option to PyMongo; if True, PyMongo reserved a socket for each greenlet. I made a separate connection pool class called GreenletPool: it shared most of its code with the standard Pool, but instead of using a threadlocal to associate sockets with threads, it used a simple dict to associate sockets with greenlets. A weakref callback ensured that the greenlet's socket was reclaimed when the greenlet died.

Half A Feature

The "use_greenlet" option and the GreenletPool didn't add too much complexity to PyMongo. But my error was this: I only gave Gevent users half a feature. My "improvement" was as practical as adding half a wheel to a bicycle.

At the time, I clearly described my half-feature in PyMongo's documentation:

Using Gevent Without Threads

Typically when using Gevent, you will run from gevent import monkey; monkey.patch_all() early in your program's execution. From then on, all thread-related Python functions will act on greenlets instead of threads, and PyMongo will treat greenlets as if they were threads transparently. Each greenlet will use a socket exclusively by default.

Using Gevent With Threads

If you need to use standard Python threads in the same process as Gevent and greenlets, you can run only monkey.patch_socket(), and create a Connection instance with use_greenlets=True. The Connection will use a special greenlet-aware connection pool that allocates a socket for each greenlet, ensuring consistent reads in Gevent.

ReplicaSetConnection with use_greenlets=True will also use a greenlet-aware pool. Additionally, it will use a background greenlet instead of a background thread to monitor the state of the replica set.

Hah! In my commit message, I claimed I'd "improved Gevent compatibility." What exactly did I mean? I meant you could use PyMongo after calling Gevent's patch_socket() without having to call patch_thread(). But who would do that? What conceivable use case had I enabled? After all, once you've called patch_socket(), regular multi-threaded networking code doesn't work. So I had not allowed you to mix Gevent and non-Gevent code in one application.

What was I thinking? Maybe I thought "use_greenlets" worked around a bug in Gevent's threadlocals, but Gevent fixed that bug two years prior, so that's not the answer.

I suppose "use_greenlets" allowed you to use PyMongo with multiple Gevent loops, one loop per OS thread. Gevent does support this pattern, but I'm uncertain how useful it is since the Global Interpreter Lock prevents OS threads from running Python code concurrently. I'd written some clever code that was probably useless, and I greatly confused Gevent users about how they should use PyMongo.

Youthful Indiscretion

It's been three years since I added "use_greenlets". The company was so young then. We were called 10gen, and we were housed on Fifth Avenue in Manhattan, above a nail salon. The office was cramped and every seat was taken. There was no place to talk: my future boss Steve Francia interviewed me walking around Union Square. Eliot Horowitz and I negotiated my salary in the stairwell. The hardwood floors were bent and squeaky. The first day I came to work I wore motorcycle boots, and the racket they made on those bad floors made me so self-conscious I never wore them to work again. When I sat down, my chair rolled downhill from my desk and bumped into Meghan Gill behind me.

The company was young and so was I. When Bernie asked me to improve PyMongo's compatibility with Gevent, I should've thought much harder about what that meant. Instead of the half-feature I wrote, I should have given you either a whole feature or no feature.

The whole feature would have allowed you to use PyMongo with Gevent and no monkey-patching at all, provided that you set "use_greenlets". If "use_greenlets" was set to True, PyMongo would associate sockets with greenlets instead of threads, and it would use Gevent's socket implementation instead of the standard library's. This would allow Gevent to properly suspend the current greenlet while awaiting network I/O, but you could still mix Gevent and non-Gevent code in one application.

The Death Of "use_greenlets"

But even better than the whole feature is no feature. So that is what I have implemented for PyMongo 3.0: in the next major release, PyMongo will have no Gevent-specific code at all. PyMongo will work with Gevent's monkey.patch_all() just like any other Python library does, and use_greenlets is gone. In our continuous integration server we'll test Gevent and, if practical, other monkey-patching frameworks like Eventlet and Greenhouse, to make sure they work with PyMongo. But we won't privilege Gevent over the other frameworks, nor distort PyMongo's design for the sake of a half-feature no one can use.

Post-Mortem

The lesson here is obvious: gather requirements. It's harder for an open source author to gather requirements than it is for a commercial software vendor, but it's far from impossible. Gevent has a mailing list, after all. At the time it didn't occur to me to discuss with Gevent users what they wanted from PyMongo.

Nowadays I'd know better. Especially when I'm not scratching my own itch, when I'm integrating with a library I don't use, I need to define rigorously what need I'm filling. Otherwise I'm meeting you in a foreign country with a ship full of the wrong goods for trade.

The same challenge presents itself to me now with Motor, my async driver for MongoDB. So far Motor has only worked with Tornado, an async framework I've used and know well. But I'm going to start integrating Motor with asyncio and, eventually, Twisted, and I need to be awfully careful about gathering requirements. One technique I'll use is eating my own hamster food: Before I release the version of Motor that supports asyncio, I'll port Motor-Blog, the software that runs this site, from Tornado to asyncio. That way there will be at least one real-world application that uses Motor and asyncio before I release the new version.


Stay tuned for the next installment in "It Seemed Like A Good Idea At The Time": PyMongo's "copy_database".

Build vs Buy a Data Quality Solution: Which is Best for You? Maintaining high quality data is essential for operational efficiency, meaningful analytics and good long-term customer relationships. But, when dealing with multiple sources of data, data quality becomes complex, so you need to know when you should build a custom data quality tools effort over canned solutions. Download our whitepaper for more insights into a hybrid approach.

Topics:

Published at DZone with permission of A. Jesse Jiryu Davis, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}