Over a million developers have joined DZone.

What Are Generators For in Python?

· Web Dev Zone

Start coding today to experience the powerful engine that drives data application’s development, brought to you in partnership with Qlik.

I teach Python classes and enjoy exploring language features from the perspective of newbie's to the language. Usually I can explain the rationale for Python language features by showing a compelling use case.

For example a common programming task is to iterate over a container of known length and do something for each item in the container. In a C-like language this might look like:

index = 0; // A
while(index < container.length){ // B
  item = container[index]; //C
  l++; // D

Of course this pattern is so common that C-like languages have a looping construct just to handle this situation. Notice that the counter initialization (A), the bounds checking (B), and the incrementing (D) are all combined into a single syntactical pattern:

for(index=0;index<container.length;index++){ // A, B, D
    item = container[index];

Python goes one step further and provides a simpler iterating construct. If what we want is item, why bother with index and container.length?

for item in container: // E - implicit assignment without temp counter variables

But what about generator functions?

I've always been a little stuck with generator functions though - generator expressions make sense as a generator version of list comprehensions and I suppose if you had iteration you can't wrap up in a list comprehension you'd want to be able to do it in a function... Oh and David Beazley does wicked cool things with generators, but that seems kinda advanced...

So I've been stuck with a use case for generators functions till I recently ran into a problem for which they turned out to be exactly the right solution.

Consider a remote API with a record limit. For any given query there may be 1,000 records that match but only PAGE_SIZE records will be returned. The API does provide the ability to do offset queries so I could manually check the number of records and then repeatedly issue offset queries to get all the records. Of course this means I can't simply loop over the list of records - now my (pseudo)code looks like:

index = 0
page_index = 0
request = session.CreateSearchRequest(*query)
results = session.Search(request)
while results.HasNext(): # by side effect advances to the next record
    pos += 1
    page_index += 1
    if page_index == PAGE_SIZE: # regenerate result set
       results = session.Search(request)
       page_index = 0 # start the next paged set

This is pretty ugly - I've got way more code managing state in my loop than actually processing my results. Ideally I'd like to be able to ignore the paging limits and the non-Pythonic API (librets - I'm looking at you) where .HasNext() returns True or False and has the side effect of advancing the result set. Ideally I'd like to write:

request = session.CreateSearchRequest(query)
for results in magically_loop_over_my_search(request):


My breakthrough came thanks to a Ned Batchelder article on iteration I read recently. Ned pointed out that generators can be used to reshape iteration. As an example: looping through a two dimensional data structure like a grid (assuming a .getcell(x,y) kind of API) is done most easily with nested for loops to get every cell.

To get every cell in a single simple iteration would require some math and extra variables- a row counter, a column counter (reset to 0 ever row_length iterations), plus a cell = grid[r][c] kind of assignment to actually retrieve the cell.

Hey - that description looks a lot like my problem with the paged data set!

Ned's solution is to simply do the nested for loops inside a function and yield a cell's coordinates allowing the following simple iteration:

for (x, y) in twod_range(width, height):
    do_something(spreadsheet.get_cell(x, y))

It's important to note that without the yield statement our iterator would be more complicated. We could write a regular function that returns a list containing all the coordinate pairs - but it would be more complicated code (list initialization, append to the list, and return the list). It would also be less efficient - I want to process 20,000 records that are good sized and I don't want to make a single list containing all 20,000 records. Generator functions to the rescue!

Back to my problem: by moving the page handling code into a generator function:

PAGE_SIZE = 250   
def paged_results(session, request, offset=0):
    return session.Search(request)

def get_results(session, request):
    pos = current_loop = 0
    results = paged_results(session, request)
    num_records = results.GetCount()
    while pos < num_records and results.HasNext(): # .HasNext() advances result set
        yield results
        pos += 1
        current_loop += 1
        if current_loop = PAGE_SIZE:
            results = paged_results(session, request, offset=pos)
            current_loop = 0 # reset paging

I'm able at various places to consume my paged result set with the desired Pythonic API:

request = session.CreateSearchRequest(*query) 
for results in get_results(session, request):

 I'm now realizing that while loops may be examples of Un-Pythonic code. Would I rather say

while not file.EOF():
    line = file.get_line()


for line in file:

 My new compelling use case for simple generator functions is:

Generator functions give me a conceptually easy tool to reshape complicated iteration into simple for loops.

Create data driven applications in Qlik’s free and easy to use coding environment, brought to you in partnership with Qlik.


Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}