Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Python: Look Ahead Multiple Elements in an Iterator/Generator

DZone's Guide to

Python: Look Ahead Multiple Elements in an Iterator/Generator

· Web Dev Zone
Free Resource

Learn how to build modern digital experience apps with Crafter CMS. Download this eBook now. Brought to you in partnership with Crafter Software

As part of the BBC live text scraping code I’ve been working on I needed to take an iterator of raw events created by a generator and transform this into an iterator of cards shown in a match.

The structure of the raw events I’m interested in is as follows:

  • Line 1: Player booked
  • Line 2: Player fouled
  • Line 3: Information about the foul

e.g.

events = [
  {'event': u'Booking     Pedro (Barcelona) is shown the yellow card for a bad foul.', 'sortable_time': 5083, 'match_id': '32683310', 'formatted_time': u'84:43'}, 
  {'event': u'Rafinha (FC Bayern M\xfcnchen) wins a free kick on the right wing.', 'sortable_time': 5078, 'match_id': '32683310', 'formatted_time': u'84:38'}, 
  {'event': u'Foul by Pedro (Barcelona).', 'sortable_time': 5078, 'match_id': '32683310', 'formatted_time': u'84:38'}
]

We want to take these 3 raw events and create one ‘booking event’. We therefore need to have access to all 3 lines at the same time.

I started with the following function:

def cards(events):
    events = iter(events)

    item = events.next()
    next = events.next()
    event_id = 0

    for next_next in events:
        event = item["event"]
        booking = re.findall("Booking.*", event)
        if booking:
            player = re.findall("Booking([^(]*)", event)[0].strip()
            team = re.findall("Booking([^(]*) \((.*)\)", event)[0][1]

            associated_foul = [x for x in [(next, event_id+1), (next_next, event_id+2)]
                                 if re.findall("Foul by.*", x[0]["event"])]

            if associated_foul:
                associated_foul = associated_foul[0]
                yield event_id, associated_foul[1], player, team, item, "yellow"
            else:
                yield event_id, "", player, team, item, "yellow"

        item = next
        next = next_next
        event_id += 1

If we run the sample events through it we’d see this output:

>>> for card_id, associated_foul, player, team, item, card_type in cards(iter(events)):
      print card_id, associated_foul, player, team, item, card_type

0 2 Pedro Barcelona {'match_id': '32683310', 'event': u'Booking     Pedro (Barcelona) is shown the yellow card for a bad foul.', 'formatted_time': u'84:43', 'sortable_time': 5083} yellow

In retrospect, it’s a bit of a hacky way of moving a window of 3 items over the iterator of raw events and yielding a ‘booking’ event if the regex matches.

I thought there was probably a better way of doing this using itertools and indeed there is!

First we need to create a window function which will return us an iterator of tuples containing consecutive events:

from itertools import tee, izip

def window(iterable, size):
    iters = tee(iterable, size)
    for i in xrange(1, size):
        for each in iters[i:]:
            next(each, None)
    return izip(*iters)

Now let’s have a look at how it works using a simple example:

>>> numbers = iter(range(0,10))
>>> for triple in window(numbers, 3):
      print triple

(0, 1, 2)
(1, 2, 3)
(2, 3, 4)
(3, 4, 5)
(4, 5, 6)
(5, 6, 7)
(6, 7, 8)
(7, 8, 9)

Exactly what we need. Now let’s plug the windows function into our cards function:

def cards(events):
    events = iter(events)
    for event_id, triple in enumerate(window(events, 3)):
        item = triple[0]
        event = triple[0]["event"]

        booking = re.findall("Booking.*", event)
        if booking:
            player = re.findall("Booking([^(]*)", event)[0].strip()
            team = re.findall("Booking([^(]*) \((.*)\)", event)[0][1]

            associated_foul = [x for x in [(triple[1], event_id+1), (triple[2], event_id+2)]
                                 if re.findall("Foul by.*", x[0]["event"])]

            if associated_foul:
                associated_foul = associated_foul[0]
                yield event_id, associated_foul[1], player, team, item, "yellow"
            else:
                yield event_id, "", player, team, item, "yellow"

And finally check it still processes events correctly:

>>> for card_id, associated_foul, player, team, item, card_type in cards(iter(events)):
      print card_id, associated_foul, player, team, item, card_type

0 2 Pedro Barcelona {'match_id': '32683310', 'event': u'Booking     Pedro (Barcelona) is shown the yellow card for a bad foul.', 'formatted_time': u'84:43', 'sortable_time': 5083} yellow



Crafter is a modern CMS platform for building modern websites and content-rich digital experiences. Download this eBook now. Brought to you in partnership with Crafter Software.

Topics:
python ,html5

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}