Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Python: Equivalent to flatMap for Flattening an Array of Arrays

DZone's Guide to

Python: Equivalent to flatMap for Flattening an Array of Arrays

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

I found myself wanting to flatten an array of arrays while writing some Python code earlier this afternoon and being lazy my first attempt involved building the flattened array manually:

episodes = [
    {"id": 1, "topics": [1,2,3]},
    {"id": 2, "topics": [4,5,6]}
]

flattened_episodes = []
for episode in episodes:
    for topic in episode["topics"]:
        flattened_episodes.append({"id": episode["id"], "topic": topic})

for episode in flattened_episodes:
    print episode

If we run that we’ll see this output:

$ python flatten.py

{'topic': 1, 'id': 1}
{'topic': 2, 'id': 1}
{'topic': 3, 'id': 1}
{'topic': 4, 'id': 2}
{'topic': 5, 'id': 2}
{'topic': 6, 'id': 2}

What I was really looking for was the Python equivalent to the flatmap function which I learnt can be achieved in Python with a list comprehension like so:

flattened_episodes = [{"id": episode["id"], "topic": topic}
                      for episode in episodes
                      for topic in episode["topics"]]

for episode in flattened_episodes:
    print episode

We could also choose to use itertools in which case we’d have the following code:

from itertools import chain, imap
flattened_episodes = chain.from_iterable(
                        imap(lambda episode: [{"id": episode["id"], "topic": topic}
                                             for topic in episode["topics"]],
                             episodes))
for episode in flattened_episodes:
    print episode

We can then simplify this approach a little by wrapping it up in a ‘flatmap’ function:

def flatmap(f, items):
        return chain.from_iterable(imap(f, items))

flattened_episodes = flatmap(
    lambda episode: [{"id": episode["id"], "topic": topic} for topic in episode["topics"]], episodes)

for episode in flattened_episodes:
    print episode

I think the list comprehensions approach still works but I need to look into itertools more – it looks like it could work well for other list operations.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
python ,bigdata ,big data ,array ,flatmap

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}