Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Python Functional Programming

DZone's Guide to

Python Functional Programming

If you're interested in deepening your knowledge of Functional Programming in Python, this article offers a step-by-step guide through the process.

· Web Dev Zone
Free Resource

Learn how to build modern digital experience apps with Crafter CMS. Download this eBook now. Brought to you in partnership with Crafter Software

Different Programming Paradigms

There are several approaches to computer programming utilized to solve similar tasks and some of them are more suited to specific tasks. The most prevailing ones are:

Procedural Programming

This type of programming is a list of instructions coded by the programmer and followed strictly by the computer. Many C examples can be seen as procedural (C can be used to program in other paradigms as well, but the language was designed with procedural programming in mind). C++ and Python can be used to program in a procedural style.

Declarative Programming

This type of programming becomes an activity to describe the problem you are trying to solve instead of telling the computer how to solve it. The most famous example of declarative programming languages is SQL. The programmer doesn't tell the SQL engine whether to use indexes or scan table or which sub-clause to execute first. The engine decides which is more efficient.

Object-Oriented Programming

In object-orientation, programs are interactions of objects which have internal states, interactions are performed as messages passed from an object to object. C++ and Python support object-orientation, but they don't restrict the programmer to this single paradigm. Java is strictly object-oriented (well kind of). Smalltalk is (really) strictly object-oriented.

Functional Programming

In the functional paradigms, internal state is largely avoided so that a function call's output only depends on its parameters. It is then no surprise that functional programming languages have functions as first-class language constructs. Such functions described above are called purely functional. In practice, purely functional routines are mixed with non-purely functional ones and at some cases using global variables or causing side effects happens as well. C++, Python, Java, Scala and many other languages can be used to program in a functional way.

The above categories are just a rough cut and they don't constitute solid distinctions. In fact, the opposite of declarative programming is imperative programming. Functional and logical programming can be seen as declarative because they avoid side effects and the programmer doesn't have a complete control on how the algorithm used is actually implemented.


Characteristics of Functional Programming

Functional programming facilitates four major qualities of both theoretical and practical importance:

  • Formal Provability – to some degree proving a program correct is easier when no side effects are present because before and after constraints can be formulated more easily (side effects may cause inability to formulate after constraints, for example).

  • Modularity – functional programs are assemblies of small functions because functions are the unit of work, which results in largely modular programs.

  • Composability – it is easier to make a library developed to the functional paradigm because of constraints that are largely well defined especially in the case of absence of side effects.

  • Testing and Debugging – it is easier to test functional programs because of modularity and composability.

  • Productivity – in many cases (according to many people's experiences) functional programs are far shorter than procedural and object-oriented ones.

Python for Function Programming

One of the most important constructs in Python Functional Programming is iterators.

Iterators

Iterators are objects that represent data streams, these streams can be finite of infinite. Python iterator protocol requires an object to have a method named next() that takes no argument and returns the next element in the data stream in order for the object to be an iterator. Iterators'  next()  must raise a  StopIteration  exception when there are no more elements in the stream. An object is iterable if it is an iterator or an iterator for it can be obtained, for example, by the built-in function iter(). Built-in iterable objects include dictionaries and lists. Note that Python iterator protocol supports only going in the forward direction. However, going in the back direction can be implemented.

l = [1, 2, 3] # a list
it = iter(l)  # acquire an iterator for the list
it.next()     # outputs: 1
it.next()     # outputs: 2
it.next()     # outputs: 3
it.next()     # raises: StopIteration


Iterable objects are expected in for loops:

for e in l:
print(e)

# outputs:
# 1
# 2
# 3


is equivalent to:

for e in iter(l):
print(e)


The reverse operation can be carried as well - lists or tuples can be constructed from iterables: 

l = [1, 2, 3]
it = iter(l)
tuple(l)     # (1, 2, 3)


Note that construction of a list or a tuple from an iterator drains the iterator and causes it to raise a  StopIteration  on the next call to iterator's  next() .

l = [1, 2, 3]
it = iter(l)
tuple(it)    # creates: (1, 2, 3) or list(it) for a list(1, 2, 3)
it.next()    # raises: StopIteration


This also means that if you try to construct a list out of a drained iterator, you get an empty list.

l = [1, 2, 3]
it = iter(l)
tuple(it)     # (1, 2, 3)
list(it)      # []

Sequence unpacking supports iterators as well.

a, b, c = iter(l) # or a, b, c = l 
print(a, b, c)    # outputs: 1 2 3


Several built-in functions accept iterators as parameters.

max(iter(l))  # outputs: 3
min(iter(l))  # outptus: 1

# or equivalently
max(l)
min(l)


 in  and  not in  also support iterators. Notice that because iterators can be infinite, problems will happen if you supply such an iterator for  max() ,  min()  or  not in , and if it is the case that that item is not in the stream the same will happen with  in, so the call will never return.

Dictionary Iterators

Dictionaries support multiple types of iterators. The basic one obtained by keys() iterates over dictionary keys.

d = {
'a': 1,
'b': 2,
    'c': 3
} 
for key in d.keys(): # or d.iterkeys() for Python 2
print(key)

# outputs:
# a
# b
# c


keys()  and iterkeys() are actually the default iterator for dictionaries, so the following can be done:

d = dict(a = 1, b = 2, c = 3)
for k in d: # for k in d.keys() or for k in iter(d)
print(d[k])

# outputs:
# 1
# 2
# 3


Note that for Python 2 dict.keys() returns a list of keys, not an iterator. The moves  package provides compatiability constructs between different Python versions. Other types of iterators are values() and items() for Python 3 and, respectively, itervalues() and iteritems() for Python 2.  values()  iterates over values only whilst  items()  iterates over key/value pairs.

for v in d.values(): print(v)
# outputs
# 1
# 2
# 3

for k, v in d.items(): print(k, v)
# outputs
# a 1
# b 2
# c 3


Note that for Python 2  dict.values()  returns a list of values instead of an iterator whilst dict.items()  returns a list of 2-tuples (key, value). The dict constructor accepts a finite iterator over a 2-tuples sequence as in (key, value).

l = [('a', 1), ('b', 2), ('c', 3)]
d = dict(iter(l)) # or d = dict(l)

# d is equal to
# {
#    'a': 1,
#    'b': 2,
#    'c': 3
# }

Iterator Usage

Iterators can be used to achieve many tasks, but the two most common tasks are: 1) applying a function to each element in a data stream; 2) selecting a specific element in a data stream according to some criteria. There are several ways to do just that including map()  and  filter()  built-in functions

## Python 3
d = dict(a = 1, b = 2, c = 3)
map(str.upper, d) 

# outputs: < map at 0x111987a20 >
# which can be turned into a list 
# if supplied to the list ctor

## Python 2
import string
d = dict(a = 1, b = 2, c = 3)
map(string.upper, d)

# outputs: ['A', 'B', 'C']

map()  applies a function to each item in an iterable. Remember that the default iterator for a dictionary iterates over its keys, so in the example above,  map()applies  str.upper()  to each key in dictionary d  and appends str.upper() output to the returned list.

filter() applies a predicate function to each item in an iterable, if the predicate returns a true value that item is included in the output list, otherwise it is not.

def predicate(x):
    return x < 10

filter(predicate, [1, 5, 10, 15])
# outputs: [1, 5]


These functions are great and all, but there are is a better way to do these two tasks in a more succinct way. List comprehension is a mechanism that is very efficient and compact.

[ x.upper() for x in d]
# equivalent to map()
# outputs: ['A', 'B', 'C']

[ x
  for x in l
  if predicate(x)
]
# equivalent to filter()
# outputs: [1, 5]


List comprehension always returns a list, be it empty or otherwise. A generator, on the other hand, returns an iterator object.

it = (x.upper() for x in d])
it.next()
# outputs: 'A'

it = ( x
       for x in l
       if predicate(x) )
it.next()
# outputs: 1


The only difference in syntax is that list comprehension uses  []  whilst generators use  () . Generators can be more useful for large data streams because list comprehension would physically construct a list in memory whilst generators wouldn't.

Generators could be constructed as parameters to functions that expect iterators, the rule is that parentheses around a generator expression can be dropped if the expression constitutes a single parameter to a function that takes a single argument.

# the parameter is a generator not a list
sum(x for x in [1, 2, 3])
# outputs: 6


Generators and list comprehension work as follows:

(expression
    for expr1 in sequence1 [if condition1]
    for expr2 in sequence2 [if condition2]
    for expr3 in sequence3 [if condition3]
    ...
 for exprN in sequenceN [if conditionN] )



Expressions are included in output if they are in a sequence. All 'if' conditions are optional, but if a condition is present an expression is only included in the output if the condition evaluates to true. Expressions are evaluated as follows:

for expr1 in sequence1:
    if not(condition1):
    continue  # Skip this element
for expr2 in sequence2:
    if not(condition2):
    continue  # Skip this element
    ...
for exprN in sequenceN:
    if not(conditionN):
    continue  # Skip this element

# Output the value of the expression.


This means that for each element in sequence1,  sequence2  is iterated over from the beginning then for each pair resulting from (sequence1_item, sequence2_item)   sequence3  is iterated over from the beginning, etc. Hence, sequences are not required to be of the same length and the final number of items are the overall product of the number of items in all sequences. So that if we have two sequences, the first has 2 items and the second has 3 items, there will be 6 items in the output iterator or list provided that there are no false conditions.

l1 = [1, 2, 3]
l2 = [4, 5, 6]
def even(x): return (x % 2) == 0

[(x, y) for x in l1
        for y in l2
        if even(y)
]
# outputs: [(1, 4), (1, 6), (2, 4), (2, 6), (3, 4), (3, 6)]

[x for x in l1
   for x in l2
   if even(x)
]
# outputs: [4, 6, 4, 6, 4, 6]

[x for x in l1
   for y in l2
   if even(y)
]
# outputs: [1, 1, 2, 2, 3, 3]


Introduce  ()  if you want elements of the output list to be tuples as in  (x, y)  above to avoid syntax errors.

Iterator Object Example

An iterator object must implement the iterator protocol. There are some subtleties about iterator implementation (check PEP 234), however, in essence, it is about an object having  __iter__()  and  next()  methods. The following is a re-organised section of PEP 234 about constructing an iterator class:

  • next()  returns the next value in the iteration, or raises StopIteration (or a derived exception class) to signal the end of the iteration. Any other exception should be considered to signify an error and should be propagated normally, not taken to mean the end of the iteration.

  • Classes can define how they are iterated over by defining an  __iter__()  method; this should take no additional arguments and return a valid iterator object. A class that wants to be an iterator should implement two methods: a next()method that behaves as described above, and an  __iter__()method that returns self.

  • The two methods correspond to two distinct protocols:

1. An object can be iterated over with for if it implements  __iter__() or  __getitem__() .

2. An object can function as an iterator if it implements  next().

Container-like objects usually support protocol One. Iterators are currently required to support both protocols. The semantics of iteration come only from protocol Two; protocol One is present to make iterators behave like sequences; in particular, so that code receiving an iterator can use a for-loop over the iterator.

So, let's write an iterator class.

class SquareRoot(object):

    def __init__(self, start, end):
    self.start = start
self.end = end
self.current = start

def __iter__(self):
    return self

def next(self):
    if self.current > self.end: raise StopIteration
r = self.current * self.current
self.current += 1
return r

def reset(self):
    self.current = self.start


c = SquareRoot(1, 5)
c.next()             # outputs: 1
c.next()             # outputs: 4
c.next()             # outputs: 9
c.reset()
for i in c: print(i)

# outputs:
# 1
# 4
# 9
# 16
# 25


Generators

A usual function call always starts execution at the beginning of the body of the callee. Generators are an extension of this behavior that allows a function to be resumed at the instruction at which it was suspended. One way to implement generators' behavior in a crude way is using instance attributes to preserve the state of the function object. Another way is to use co-routines. In Python, the yield keyword allows functions to be generators, any function contains a yield keyword is a generator. A generator function outputs an iterator instead of a single value. Next continuation of a generator starts after the yield statement that produced the current output.

The above SquareRootiterator class can be written as a generator as follows:

def square_root(start, end):
    for i in range(start, end + 1):
    yield i * i


The above generator function is equivalent in all aspects to the SquareRootiterator class except in one: it cannot be reset to its initial state (except when we know what the initial state was, more about this later).

Here are more examples of generators:

def gen1():
    yield 'mentor'
yield 'lycon'
yield 'says hi'

list(gen1())['mentor', 'lycon', 'says hi']

def generator(n):
for i in range(n):
    yield i

g = generator(3)
g.next()         # outputs: 0
g.next()         # outputs: 1
g.next()         # outputs: 2
g.next()         # raises: StopIteration


A  return statement without arguments can be used inside a generator to signal the end of execution.

def gen(n):
    if (n % 2) == 0:
        return
for i in range(n):
    yield i


An example of an infinite generator is a generator to produce the Fibonacci sequence.

def fib():
    a, b = 0, 1
while True:
    yield b
a, b = b, b + a


Differences between iterators and generators can be summarized as follows:

  1. With generators, there is no need to worry about the iterator protocol.

  2. Generators are one-time operations, once the generator is exhausted you have to construct another.

  3. Iterator objects may be used for iteration several times without a need to reconstruct them (a listfor example).

Generator Interaction

Generators are not only callables that take parameters, same as regular functions, but they also allow callers to pass values to them in the middle of execution. This feature requires Python >= 2.5. To pass a value into a generator use g.send(val)  where  g  is a generator object. The following is an example of a generator that is ready to accept a value during execution:

def gen():
    i = 0
while i < 10:
    val = ( yield )
if val is None:
    i += 1
else:
    i = val


Note that it is in principle not different from the previous generators we encountered, the differences are:

  1.  val  is assigned a value from a  yield  expression.
  2. We use parentheses around  yield .

In Python 2.5, yield became an expression, so we can use it as any other right-hand expression. We have to prepare for val being None because val is going to be None except when send() is used with a value other than None. The parentheses around yield are not always required, but it is easier to always include them because it is easily overlooked where it is allowed not to include them. In short, you can forego the use of parentheses only if the  yield  is at the top-level of the right-hand expression (Look at PEP 342 for details).

With the above generator, we can do the following:

g = gen()
k = 0
l = []
for i in g:
    k += 1
    l.append(k)

# l is equal to [1, 2, 3, 4, 5, 6, 7]


Notice the following:

g = gen()
g.next()  # outputs: None
g.send(8) # outputs: None
g.next()  # outputs: None
g.next()  # raises: StopIteration

We can also do this using the same iterator:

g = gen()
g.next()   # outputs: None
g.send(8)  # outputs: None
g.next()   # outputs: None
g.send(8)  # outputs: None
g.next()   # outputs: None
g.next()   # raises: StopIteration


This clearly shows that we can control the execution of a generator as far as values sent via send()  controls its execution. Let's try a tiny variation on the same generator above.

def gen():
    i = 0
while i < 10:
    val = ( yield i )  # < --change is here
if val is None:
    i += 1
else:
    i = val


Running the same code above produces the following instead: 

g = gen()
g.next()  # outputs: 0
g.send(8) # outputs: 8
g.next()  # outputs: 9
g.next()  # raises: StopIteration

And as we did before:

g = gen()
g.next()  # outputs: 0
g.send(8) # outputs: 8
g.next()  # outputs: 9
g.send(8) # outputs: 8
g.next()  # outputs: 9
g.next()  # raises: StopIteration


yield  statement can be used for both input and output. In addition to  send(), generator objects have  close()and  throw(type, value=None, traceback=None)methods. Here is the generator method documentation for Python documentation:

generator.next()

Starts the execution of a generator function or resumes it at the last executed yield  expression. When a generator function is resumed with a  next()  method, the current  yield  expression always evaluates to  None . The execution then continues to the next  yield  expression, where the generator is suspended again, and the value of the  expression_list  is returned to  next() ‘s caller. If the generator exits without yielding another value, a  StopIteration  exception is raised.

generator.send(value)

Resumes the execution and “sends” a value into the generator function. The value argument becomes the result of the current yield expression. The  send()  method returns the next value yielded by the generator, or raises StopIteration  if the generator exits without yielding another value. When send()  is called to start the generator, it must be called with None as the argument, because there is no yield expression that could receive the value.

generator.throw(type[, value[, traceback]])

Raises an exception of type at the point where the generator was paused and returns the next value yielded by the generator function. If the generator exits without yielding another value, a StopIteration exception is raised. If the generator function does not catch the passed-in exception or raises a different exception, then that exception propagates to the caller.

generator.close()

Raises a  GeneratorExit  at the point where the generator function was paused [in the generator's code]. If the generator function then raises  StopIteration  (by exiting normally, or due to already being closed) or GeneratorExit (by not catching the exception), close returns to its caller. If the generator yields a value, a RuntimeError is raised. If the generator raises any other exception, it is propagated to the caller.  close() does nothing if the generator has already exited due to an exception or normal exit.

For  close(), clean up code may be best put in a finally clause (in some cases, in a  catch  clause).

Generators in that sense are coroutines. They are very nice as pipelines. They can be stacked on top of each other to simulate Unix pipelining, for example, with all the advantages of the generator's memory handling and its ability to process infinite streams. They can also implement producers/consumers patterns. As coroutines, they can be used to implement pseudo-concurrency (pseudo threads), where one thread schedules a (theoretically) infinite number of execution units (check greelnets and Gevent if you are interested, we will cover that later on).


Generators in Practice

Let's have a look at an example. In this example, we will parse the HTML returned from a Web page, capitalize every word found and reverse it. Capitalized words will be stored in a list and reversed words will be stored in another list.

from urllib.request import urlopen
from html.parser import HTMLParser

def producer(words, * consumers):
    for c in consumers:
    c.next()
for w in words:
    for c in consumers:
    c.send(w)

def consumer1(container):
    while True:
    w = ( yield )
container.append(w.upper())

def consumer2(container):
    while True:
    w = ( yield )
container.append(w[::-1])

class Parser(HTMLParser):

    def __init__(self):
    HTMLParser.__init__(self)
self.data = []

def handle_data(self, data):
    temp = data.translate(None, string.punctuation).strip().split()
if temp:
    for w in temp:
    self.data.append(w)

def produce(url, container1, container2):
    res = urlopen(url)
p = Parser()
p.feed(res.read())
producer(p.data, consumer1(container1), consumer2(container2))


for Python 2, imports would be

from HTMLParser import HTMLParser
from urllib2 import urlopen


producer  acts as a dispatcher, but if we route back the results from consumers to  producer  then  producer  can dispatch them to other processing consumers and so on. This way we can make a processing hub disguised in a very innocent function in a very straightforward manner.

Now let's see how we can combine two generators to capitalize and reverse a word by hand.

def reverse():
    while True:
    w = (
        yield)
if w is not None: print(w[::-1])

def capitalise():
    g = reverse()
g.next()
while True:
    w = (
        yield)
w = w.upper()
g.send(w)

>>> g = capitalise() >>> g.next() >>> g.send('ab')
BA


Notice that  BA  is printed by  reverse() , it is not yielded back to  capitalise().

We can simplify this a bit if we introduce a decorator to start the generator automatically so that we need not call  next()  upon creation.

def coroutine(func):
    def wrap( * args):
    g = func( * args)
g.next()
return g
return wrap


The above code would be re-written as follows:

@
coroutine
def reverse():
    while True:
    w = (
        yield)
if w is not None: print w[::-1]

@ coroutine
def capitalise():
    g = reverse()
while True:
    w = (
        yield)
w = w.upper()
g.send(w)

>>> g = capitalise() >>> g.send('ab')
BA


It doesn't save much of typing, but it helps to build a better abstraction. This will be used later on.

At this point we could chain generators in a forward-direction manner, but could we make the communication two-way? Yes, we can and it is very simple.

def multiplier():
    w = None
  while True:
      w = (
          yield w * 2
          if w
          else None)

def reverse2():
    g = multiplier();
    g.next()
    w = None
    while True:
        w = (
            yield g.send(w[::-1]) if w
            else None)

def capitalise2():
    g = reverse2();
    g.next()
    w = None
    while True:
        w = (
            yield)
    w = w.upper()
    print(g.send(w))

g = capitalise2()
g.next()
g.send('kl') # outputs: LKLK


This is printed from inside  capitalise2() . Two-way communication is actually nothing more than normal function call semantics.

For general-purpose scenarios, however, we need pluggable components. So, let's separate the generators and call them in a row if we so desire.

@coroutine
def multiplier():
    w = None
    while True:
        w = (
            yield w * 2
            if w
            else None)

@coroutine
def reverse3():
    w = None
    while True:
        w = (
            yield w[::-1]
            if w
            else None)

@coroutine
def capitalise3():
    w = None
    while True:
        w = (
            yield w.upper() if w
            else None)

multiplier(reverse3(capitalise3('ml'))) #outputs: LMLM


Note that  multiplier()  didn't change.


Let's have a practical example of why this programming style is helpful. Here are two functions that do the same thing: counting the number of words in a file.  f1() is written in the traditional style, whilst  f2()  is written as a pipeline of generators.

import re

def f1(filename):
    p = re.compile(r '\s+')
    fi = open(filename)
    total = 0
    for line in fi:
        words = p.split(line)
    total += len(words)
    return total

def f2(filename):
    p = re.compile(r '\s+')
    fi = open(filename)
    words = (p.split(line) for line in fi)
    l = (len(w) for w in words)
    return sum(l)


It is easily noted that  f2()  is shorter and is a bit weird, in fact. Let's look at performance (make sure to read the comment afterwards) and remember, “there is a small lie, a big lie and the benchmark.”

Note that benchmarking was carried on Python 2.

File size f1() execution time
f2() execution time 
232.4 MB 7.655333333 7.502666667
1.1 GB 51.694 37.136


File size f1() average execution time f2() average execution time
12 KB 0.004 0.006
82.1 MB 36.911 31.470
232.4 MB 91.042 39.363
1.1 GB 414.212 290.906

Measures of cProfile (standard python deterministic profiler).


Performance Comments

The difference in timings for the two tables has to do with what's being measured by each method. The timings you would practically experience are the timings in the first table.

Results in the first table above are average times for function executions in row,  f1()  then  f2()  or  f2()  then  f1()  and each function followed by random user code.

For very small files,  f1()  outperforms  f2()  and it seems that generators overhead is not compensated by file size. For large files, it matters whether each function is run in a row or not. If we compile the regular expression for each line, then if f1()  is executed in a row with absolutely no user-code in between, the execution time will grow slower than for f2(). However, this is possible but not a typical scenario outside benchmarking.

On contrary, when random user code is executed between function invocations, f2()  execution time remains stable whilst  f1()  execution time keeps increasing slowly.

It is important to note that especially for large files these functions spend more time managing memory allocation than on calculations.  f2() can handle calculations of files larger than available memory as it is, however, f1( would require amendments. Such amendments will increase its execution time.

Crafter is a modern CMS platform for building modern websites and content-rich digital experiences. Download this eBook now. Brought to you in partnership with Crafter Software.

Topics:
python ,functional programing ,web dev

Published at DZone with permission of Kareem Alkaseer. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}