Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Python: Learning About defaultdict's Handling of Missing Keys

DZone's Guide to

Python: Learning About defaultdict's Handling of Missing Keys

What seems to happen is that when we try to find a key that doesn’t exist in the dictionary, an entry gets created with a value equal to the number of items in the dictionary.

· Big Data Zone ·
Free Resource

How to Simplify Apache Kafka. Get eBook.

While reading the scikit-learn code, I came across a bit of code that I didn’t understand for a while, but in retrospect, is quite neat.

This is the code snippet that intrigued me:

vocabulary = defaultdict()
vocabulary.default_factory = vocabulary.__len__

Let’s quickly see how it works by adopting an example from scikit-learn:

>>> from collections import defaultdict
>>> vocabulary = defaultdict()
>>> vocabulary.default_factory = vocabulary.__len__
 
>>> vocabulary["foo"]
0
>>> vocabulary.items()
dict_items([('foo', 0)])
 
>>> vocabulary["bar"]
1
>>> vocabulary.items()
dict_items([('foo', 0), ('bar', 1)])

What seems to happen is that when we try to find a key that doesn’t exist in the dictionary, an entry gets created with a value equal to the number of items in the dictionary.

Let’s check if that assumption is correct by explicitly adding a key and then trying to find one that doesn’t exist:

>>> vocabulary["baz"] = "Mark
>>> vocabulary["baz"]
'Mark'
>>> vocabulary["python"]
3

Now, let’s see what the dictionary contains:

>>> vocabulary.items()
dict_items([('foo', 0), ('bar', 1), ('baz', 'Mark'), ('python', 3)])

All makes sense so far. If we look at the source code, we can see that this is exactly what’s going on:

"""
__missing__(key) # Called by __getitem__ for missing key; pseudo-code:
  if self.default_factory is None: raise KeyError((key,))
  self[key] = value = self.default_factory()
  return value
"""
pass

scikit-learn uses this code to store a mapping of features to their column position in a matrix, which is a perfect use case.

All in all, very neat!

12 Best Practices for Modern Data Ingestion. Download White Paper.

Topics:
big data ,python ,missing keys ,scikit-learn

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}