Python Collections Abstract Base Classes
Data structures and their setup are important in any language. Have a look as we explore Python's Collection Abstract Base Classes in-depth.
Join the DZone community and get the full member experience.
Join For FreeI love thinking about data structures, and how to organize them most efficiently for a specific task. In the normal course of programming in Python, we don't have to think about it very much - the choice between list
and dict
is obvious, and that's usually as far as things go.
When things get more complex, though, the Collections Abstract Base Classes can be extremely useful. In my experience, they aren't universally known about, so in this post I'll show a couple of interesting uses for them.
List-Based Set
Using a set
requires that the items held within are all hashable (that is, they implement the __hash__
method).
This isn't always the case, though. For example, Django models that don't have a PK yet are unhashable, as are dict
s. In these situations, it can be useful to have a data structure which acts like a set
, but which is backed by a list
to sidestep that requirement. Performance will be worse, but in some cases this is acceptable.
>>> s = ListBasedSet([
>>> {
>>> 'id': 1,
>>> },
>>> {
>>> 'id': 2,
>>> },
>>> ])
>>> len(s)
2
This can be easily achieved using the MutableSet
Abstract Base Class:
import collections
class ListBasedSet(collections.MutableSet):
store = None
def __init__(self, items):
self.store = list(items) or []
def __contains__(self, item):
return item in self.store
def __iter__(self):
return iter(self.store)
def __len__(self):
return len(self.store)
def add(self, item):
if item not in self.store:
self.store.append(item)
def discard(self, item):
try:
self.store.remove(item)
except ValueError:
pass
This exposes the exact same API as a built-in set
.
>>> s.add({
>>> 'id': 3,
>>> })
>>> len(s)
>>> s.clear()
>>> len(s)
0
Lazy-Loading and Pagination
If you have an API that paginates results, but you'd like to expose it as a simple list
that can be iterated over, the Collections Abstract Base Classes are a good way to do that.
As an example, APIs often return a response with a list of objects and the total number of objects available:
{
"objects": [
{
"id": 1
},
{
"id": 2
}
],
"total": 2
}
In such a case, a class like the following could be used to load the data lazily, when an item in the list is accessed:
class LazyLoadedList(collections.Sequence):
def __init__(self, url):
self.url = url
self.page = 0
self.num_items = 0
self.store = []
def load_data(self):
data = requests.get(self.url, params={
'page': self.page,
}).json()
self.num_items = data['total']
objects = data.get('objects', [])
self.store += objects
return len(objects)
def __getitem__(self, index):
while index >= len(self):
self.page += 1
if not self.load_data():
break
return self.store[index]
def __len__(self):
return self.num_items
With this implementation, you can simply iterate over the list as normal and have the paginated data loaded automatically:
>>> l = LazyLoadedList('http://api.example.com/items')
>>> for item in l:
>>> process_item(item)
At Zapier, we use something very similar to this to wrap ElasticSearch responses.
I hope these examples show some of the things that can be achieved with Python's Collections Abstract Base Classes!
Published at DZone with permission of Rob Golding, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments