Python Data Structure Idioms

DZone 's Guide to

Python Data Structure Idioms

Whether you're traversing a list, creating a map, or filtering elements in a collection, it's important to know how effectively do so in Python.

· Web Dev Zone ·
Free Resource

A significant portion of our time as developers is spend writing code that manipulates basic data structures: traverse a list, create a map, filter elements in a collection. Therefore, it is important to know how effectively do it in Python and make your code more readable and efficient.

Iterate Over a List

There are many ways to iterate over a list in Python. The simplest way would be just to maintain current position in the list and increment it on each iteration:

l = [1, 2, 3, 4, 5]
i = 0
while i < len(l):
  print l[i]
  i += 1

This works, but Python provides a more convenient way to do using the range function. The range function can be used to generate numbers from 0 to N and this can be used as an analog of a for loop in C:

for i in range(len(l)):
  print l[i]

While this is more concise, there is a better way to do it since Python lets iterate over a list directly, similarly to foreach loops in other languages:

for v in l:
  print v

Iterate a List in Reverse Order

How can we iterate a list in the reverse order? One way to do it would be to use an unreadable three arguments version of the range function and provide position of the last element in a list (first argument), position of an element before the first element in the list (second argument) and negative step to go in reverse order (third argument):

for i in range(len(l) - 1, -1, -1):
  print l[i]

However, as you’ve may already guessed, Python should offer a much better way to do it. We can just use the reversed function in a for loop:

for i in reversed(l):
  print i

Access the Last Element

A commonly used idiom to access the last element in a list would be to get the length of a list, subtract one from it, and use result number as a position of the last element:

l = [1, 2, 3, 4, 5]
>>> l[len(l) - 1]

This is cumbersome in Python since it supports negative indexes to access elements from the end of the list. So, -1 is the last element:

>>> l[-1]

Negative indexes can also be used to access a next to last element and so on:

>>> l[-2]
>>> l[-3]

Use Sequence Unpacking

A common way to extract values from a list to multiple variables in other programming languages would be to use indexes:

l1 = l[0]
l2 = l[1]
l3 = l[2]

However, Python supports sequence unpacking that lets us extract values from a list to multiple variables:

l1, l2, l3 = [1, 2, 3]

>>> l1
>>> l2
>>> l3

Use Lists Comprehensions

Let’s say that we want to filter all grades for a movie posted by users of age 18 or bellow.

How many times did you write code like the following?

under_18_grades = []
for grade in grades:
  if grade.age <= 18:

Do it no more in Python. Use list comprehensions with an if statement instead.

under_18_grades = [grade for grade in grades if grade.age <= 18]

Use Enumerate Function

Sometimes, you need to iterate over a list and keep track of the position of each element. For example, if you need to display menu items in a shell, you can simply use the range function:

for i in range(len(menu_items)):
  menu_items = menu_items[i]
  print "{}. {}".format(i, menu_items)

A better way to do it would be to use the enumerate function. It is an iterator that returns pairs, each of which contains the position of an element and the element itself:

for i, menu_items in enumerate(menu_items):
  print "{}. {}".format(i, menu_items)

Use Keys to Sort

A typical way to sort elements in other programming languages is to provide a function that compares two objects along with a collection to sort. In Python, it would look like this:

people = [Person('John', 30), Person('Peter', 28), Person('Joe', 42)]

def compare_people(p1, p2):
  if p1.age < p2.age:
    return -1
  if p1.age > p2.age:
    return 1
  return 0

sorted(people, cmp=compare_people)

[Person(name='Peter', age=28), Person(name='John', age=30), Person(name='Joe', age=42)]

However, this is not the best way to do it. Since all we need to do to compare two instances of Person class is to compare values of their age field. Why should we write a complex compare function for this?

Specifically for this case, the sorted function accepts the key function that is used to extract a key that will be used to compare two instances of an object:

sorted(people, key=lambda p: p.age)
[Person(name='Peter', age=28), Person(name='John', age=30), Person(name='Joe', age=42)]

Use All/Any Functions

If you want to check if all or any value in a collection is True, one way would be iterate over a list:

def all_true(lst):
  for v in lst:
    if not v:
      return False
    return True

However, Python already has all/any functions for that. all returns True if all values in an iterable passed to it are True, while any returns True if at least one of values passed to it is True:

all([True, False])
>> False

any([True, False])
>> True

To check if all items comply with a certain condition, you can convert a list of arbitrary objects to a list of booleans using a list comprehension:

all([person.age > 18 for person in people])

However, you can pass a generator (just omit square braces around the list comprehension):

all(person.age > 18 for person in people)

Not only will this save you two keystrokes; it will also omit the creation of an intermediate list (more about this later).

Use Slicing

You can take apart a list using a technique called slicing. Instead of providing a single index in a square bracket when accessing a list, you can provide the following three values:


All of these parameters are optional and you can get different parts of a list if you omit some of them. If only the start position is provided, it will return all elements in a list starting with the specified index:

>>> lst = range(10)
>>> lst[3:]
[3, 4, 5, 6, 7, 8, 9]

If only the end position is provided, slicing will return all elements up to the provided position:

>>> lst[:-3]
[0, 1, 2, 3, 4, 5, 6]

You can also get part of a list between two indexes:

>>> lst[3:6]
[3, 4, 5]

By default, step-in slicing is equal to one, which means that all elements between start and end positions are returned. If you want to get only every second element or every third element, you need to provide a step value:

>>> lst[2:8:2]
[2, 4, 6]

Do Not Create Unnecessary Objects

range is a useful function if you need to generate consistent integer values in a range, but it has one drawback: it returns a list with all generated values:

# Returns a too big list
for i in range(1000000000):

The solution here is to use the xrange function. It immediately returns an iterator instead of creating a list:

# Returns an iterator
for i in xrange(1000000000):

The drawback of xrange comparing to the range function is that its output can be iterated only once.

New in Python 3

In Python 3, xrange was removed and the range function behaves like xrange in Python 2.x. If you need to iterate over an output of range in Python 3 multiple times, you can convert its output into a list:

>>> list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Use izip

If you need to generate pairs from elements in two collections, one way to do it would be to use the zip function:

names = ['Joe', 'Kate', 'Peter']
ages = [30, 28, 41]
# Creates a list
zip(names, ages)

[('Joe', 30), ('Kate', 28), ('Peter', 41)]

Instead, we can use the izip function that would return a return an iterator instead of creating a new list:

from itertools import izip
# Creates an iterator
it = izip(names, ages)

In Python 3, the izip function is removed and zip behaves like izip function in Python 2.x.

Use Generators

Lists comprehensions is a powerful tool in Python, but since it can use an extensive amount of memory, each list comprehension will create a new list:


# Original list
lst = range(10)
# This will create a new list
lst_1 = [i + 1 for i in lst]
# This will create another list
lst_2 = [i ** 2 for i in lst_1]

A way to avoid this is to use generators instead of list comprehensions. The difference in syntax is minimal: you should use parenthesis instead of square brackets, but the difference is crucial. The following example does not create any intermediate lists:


# Original list
lst = range(10)
# Won't create a new list
lst_1 = (i + 1 for i in lst)
# Won't create another list
lst_2 = (i ** 2 for i in lst_1)

This is especially handy if you may need to process only part of the result collection to get a result, such as to find a first element that match a certain condition.

Avoid using keys() function

If you need to iterate over keys in a dictionary you may be inclined to use keys function on a hash map:

for k in d.keys():
  print k

However, there is a better way. You use iterate over a dictionary and it performs iteration over its keys, so you can simply do:

for k in d:
  print k

Not only it will save you some typing; it will also prevent  youfrom creating a copy of all keys in a dict as keys method does.

Iterate Over Keys and Values

If you use the keys method, it’s really easy to iterate keys and values in a dictionary like this:

for k in d:
  v = d[k]
print k, v

There is a better way. You can use the items function, which returns key-value pairs from a dictionary:

for k, v in d.items():
  print k, v

Not only is this method is more concise, but it’s more efficient, too.

Use Dictionaries Comprehension

One way to create a dictionary is to assign values to it one-by-one:


d = {}
for person in people:
  d[person.name] = person

Instead, you can use a dictionary comprehension to turn this into a one-liner:

d = {person.name: person for person in people}

Use Collections Module

Use namedtuple

If you need a struct like this, you may just define a class with an init method and a bunch of fields:

class Point(object):
  def __init__(self, x, y):
    self.x = x
    self.y = y

However, the collections module from the Python library provides a namedtuple type that turns this into a one-liner:

from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])

In addition, namedtuple implements __str__, __repr__, and __eq__ methods:

>>> Point(1, 2)
Point(x=1, y=2)
>>> Point(1, 2) == Point(1, 2)

Use defaultdict

If we need to count the number of times an element is encountered in a collection, we can use a common approach:

d = {}
for v in lst:
  if v not in d:
    d[v] = 1
    d[v] += 1

The collections module provides a very handy class for this case called defaultdict. Its constructor accepts a function that will be used to calculate a value for a non-existing key:

>>> d = defaultdict(lambda: 42)
>>> d['key']

To rewrite counting example, we can pass the int function to defaultdict, which returns zero if called with no arguments:

from collections import defaultdict
d = defaultdict(int)
for v in lst:
  d[v] += 1

defaultdict is useful when you need to create any kind of grouping of items in a collection, but if you just need to get a count of elements, you may use the Counter class instead:

from collections import Counter

>>> counter = Counter(lst)
>>> counter
Counter({4: 3, 1: 2, 2: 1, 3: 1, 5: 1})
idioms ,python ,tutorial ,web dev

Published at DZone with permission of Ivan Mushketyk , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}