Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Python: Equivalent to flatMap for Flattening an Array of Arrays

DZone's Guide to

Python: Equivalent to flatMap for Flattening an Array of Arrays

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

I found myself wanting to flatten an array of arrays while writing some Python code earlier this afternoon and being lazy my first attempt involved building the flattened array manually:

episodes = [
    {"id": 1, "topics": [1,2,3]},
    {"id": 2, "topics": [4,5,6]}
]

flattened_episodes = []
for episode in episodes:
    for topic in episode["topics"]:
        flattened_episodes.append({"id": episode["id"], "topic": topic})

for episode in flattened_episodes:
    print episode

If we run that we’ll see this output:

$ python flatten.py

{'topic': 1, 'id': 1}
{'topic': 2, 'id': 1}
{'topic': 3, 'id': 1}
{'topic': 4, 'id': 2}
{'topic': 5, 'id': 2}
{'topic': 6, 'id': 2}

What I was really looking for was the Python equivalent to the flatmap function which I learnt can be achieved in Python with a list comprehension like so:

flattened_episodes = [{"id": episode["id"], "topic": topic}
                      for episode in episodes
                      for topic in episode["topics"]]

for episode in flattened_episodes:
    print episode

We could also choose to use itertools in which case we’d have the following code:

from itertools import chain, imap
flattened_episodes = chain.from_iterable(
                        imap(lambda episode: [{"id": episode["id"], "topic": topic}
                                             for topic in episode["topics"]],
                             episodes))
for episode in flattened_episodes:
    print episode

We can then simplify this approach a little by wrapping it up in a ‘flatmap’ function:

def flatmap(f, items):
        return chain.from_iterable(imap(f, items))

flattened_episodes = flatmap(
    lambda episode: [{"id": episode["id"], "topic": topic} for topic in episode["topics"]], episodes)

for episode in flattened_episodes:
    print episode

I think the list comprehensions approach still works but I need to look into itertools more – it looks like it could work well for other list operations.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
python ,bigdata ,big data ,array ,flatmap

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}