Serializing Python-Requests' Session Objects for Fun and Profit
Join the DZone community and get the full member experience.
Join For FreeOriginally Authored By Shrikant Sharat
If you haven't checked out @kennethreitz's excellent python-requests library
yet, I suggest you go do that immediately. Go on, I'll wait for you.
Had your candy? That is one of the most beatiful pieces of python code I've read. And its an excellent library with a very humane API.
Recently, I have been using this library for a few of my company's internal projects and at a point I needed to serialize and save Session objects for later. That wasn't as straightforward as I first thought it'd be, so I am sharing my experience here.
First off, let's make a simple http server which we are going to contact with python-requests. The server should be able to handle cookie based sessions and also have basic auth, as these things are handled by python-requests' Session objects on the client side. I won't discuss the code for the server here, you can get it from bitbucket.
Once you have the server running, now for the client, lets do requests!
import requests as req URL_ROOT = 'http://localhost:5050' def get_logged_in_session(name): session = req.session(auth=('user', 'pass')) login_response = session.post(URL_ROOT + '/login', data={'name': name}) login_response.raise_for_status() return session def get_whoami(session): response = session.get(URL_ROOT + '/whoami') response.raise_for_status() return response.text
I defined two functions here. The get_logged_in_session will create a new
session and login to the http server and return that session. Any subsequent
requests using this sesssion will be made as if you have logged in. That's what
will be tested with the get_whoami function, which will just return the
response from /whoami.
Lets test this out. Make sure the server.py is running and in another terminal,
$ python -i client.py >>> s = get_logged_in_session('sharat') >>> get_whoami(s) u'You are sharat' >>> get_whoami(req.session(auth=('user', 'pass'))) u'You are a guest'
Works perfectly. If we pass it the logged in session, it gives us the username
and if we pass it a new session, it gives us a guest.
Now, lets assume we have two functions, serialize_session and deserialize_session which do exactly what their names say. We can test them out by running a small test.py, as
from client import get_logged_in_session, get_whoami from serializer import deserialize_session, serialize_session session = get_logged_in_session('sharat') dsession = deserialize_session(serialize_session(session)) assert get_whoami(session) == get_whoami(dsession) print 'Success'
and a dummy serializer.py
def serialize_session(session): return session def deserialize_session(session): return session
And with that, of course, the test will not fail
$ python test.py Success
Serializing
Now, to implement the functions in serializer.py. A simple one, would be to use pickle. Lets try
import pickle as pk def serialize_session(session): return pk.dumps(session) def deserialize_session(data): return pk.loads(data)
If you run test.py now, python is going to yell at you.
$ python test.py Traceback (most recent call last): File "test.py", line 10, in <module> dsession = deserialize_session(serialize_session(session)) [ ... ] raise TypeError, "can't pickle %s objects" % base.__name__ TypeError: can't pickle lock objects
Oh well, it was worth a try I suppose.
Update: The Session class can be made to implement the pickle protocol if you want to use pickle.
Next plan I had was to pick up attributes and data from a Session object, just enough to recreate this object using the Session constructor, and serialize those attributes as a json. After all, the Session's API is very easy to use, how hard can picking attributes from it be? :)
So, I dug in the sessions.py module of python-requests library. And here's what the signature of the constructor for Session objects looks like
def __init__(self, headers=None, cookies=None, auth=None, timeout=None, proxies=None, hooks=None, params=None, config=None, verify=True): # ...
So, if I pick up just these values, I should be able to recreate the session
object. Sweet.
import json import requests as req def serialize_session(session): attrs = ['headers', 'cookies', 'auth', 'timeout', 'proxies', 'hooks', 'params', 'config', 'verify'] session_data = {} for attr in attrs: session_data[attr] = getattr(session, attr) return json.dumps(session_data) def deserialize_session(data): return req.session(**json.loads(data))
And let's try this out
$ python test.py Traceback (most recent call last): File "test.py", line 12, in <module> assert get_whoami(session) == get_whoami(dsession) [ ... ] [...]requests/models.py", line 447, in send r = self.auth(self) TypeError: 'list' object is not callable
Okay, that error message is very wierd. Why would anyone call a list object?
Go dig in the models.py module. See this
[ ... ] if isinstance(self.auth, tuple) and len(self.auth) == 2: # special-case basic HTTP auth self.auth = HTTPBasicAuth(*self.auth) # Allow auth to make its changes. r = self.auth(self) [ ... ]
There. Its not a list that's being called. Not directly at least. The problem
here is that the auth we are passing to session() is not a tuple. Duh!
While I like it that auth is restricted to be a tuple, I wish there was a
better error message for when auth is a list instead of a tuple. I
personally wouldn't want it to accept a list for auth though.
So, what went wrong? json does not differentiate between a tuple and a list. It only does lists. So, when serializing and deserializing, the auth tuple is turned to a list. Lets turn it back
def deserialize_session(data): session_data = json.loads(data) if 'auth' in session_data: session_data['auth'] = tuple(session_data['auth']) return req.session(**session_data)
And
$ python test.py Traceback (most recent call last): File "test.py", line 12, in <module> assert get_whoami(session) == get_whoami(dsession) [ ... ] File "/usr/lib/python2.7/string.py", line 493, in translate return s.translate(table, deletions) TypeError: translate() takes exactly one argument (2 given)
Wait. What? Now we have an error from stdlib? This just keeps getting better and better. If this looks like something that can frustrate you, go get some coffee :)
If you look at the complete stack trace, the second file from bottom,
File "[...]site-packages/requests/packages/oreos/monkeys.py", line 470, in set if "" != translate(key, idmap, LegalChars):
This thing seems to be calling the translate method incorrectly. With a bit
of debugging and yelling at my monitor, I found out the problem and for a
moment, lost my grip on reality.
str.translate takes 2 arguments, but unicode.translate takes only 1. I have no idea why this is done this way but I sure as hell didn't enjoy it. The code in oreos/monkeys.py assumes that the key is a str. However, what json.loads gives you, is unicode stuff. So, we need to convert just the parts in the deserialized dict we get from json.loads which are being used by the oreos/monkeys.py, from unicode to str.
Reading a bit more code around the oreos library, it didn't take long to figure out that those were the keys in the cookies dict. Lo
def deserialize_session(data): session_data = json.loads(data) if 'auth' in session_data: session_data['auth'] = tuple(session_data['auth']) if 'cookies' in session_data: session_data['cookies'] = dict((key.encode(), val) for key, val in session_data['cookies'].items()) return req.session(**session_data)
And so
$ python test.py Success
!
All the code is on a bitbucket repository.
Update: Pickling can also work
As Daslch pointed out in his comment on reddit, by implementing the pickle protocol on the Session class, we can get pickling to work. From the documentation, we need two methods, __getstate__ and __setstate__.
Adding those methods as follows to sessions.Session class
def __getstate__(self): attrs = ['headers', 'cookies', 'auth', 'timeout', 'proxies', 'hooks', 'params', 'config', 'verify'] return dict((attr, getattr(self, attr)) for attr in attrs) def __setstate__(self, state): for name, value in state.items(): setattr(self, name, value) self.poolmanager = PoolManager( num_pools=self.config.get('pool_connections'), maxsize=self.config.get('pool_maxsize') )
with this as the version of serializer.py that uses pickle, we do get a
Success.
The creation of new poolmanager in __setstate__ is a piece of code copied
from __init__ of the same class. This should probably be turned to a method
to avoid code repetition.
Update 2: Created an issue about this.
Update 3: This has been merged and Session objects are pickleable as of
version 0.10.3. See requests history.
Source: http://sharats.me/serializing-python-requests-session-objects-for-fun-and-profit.html
Opinions expressed by DZone contributors are their own.
Comments