Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Python: Selecting Certain Indexes in an Array

DZone's Guide to

Python: Selecting Certain Indexes in an Array

· Java Zone ·
Free Resource

FlexNet Code Aware, a free scan tool for developers. Scan Java, NuGet, and NPM packages for open source security and open source license compliance issues.

A couple of days ago I was scrapping the UK parliament constituencies from Wikipedia in preparation for the Graph Connect hackathon and had got to the point where I had an array with one entry per column in the table.

import requests

from bs4 import BeautifulSoup
from soupselect import select

page = open("constituencies.html", 'r')
soup = BeautifulSoup(page.read())

for row in select(soup, "table.wikitable tr"):
    if select(row, "th"):
        print [cell.text for cell in select(row, "th")]

    if select(row, "td"):
        print [cell.text for cell in select(row, "td")]
$ python blog.py
[u'Constituency', u'Electorate (2000)', u'Electorate (2010)', u'Largest Local Authority', u'Country of the UK']
[u'Aldershot', u'66,499', u'71,908', u'Hampshire', u'England']
[u'Aldridge-Brownhills', u'58,695', u'59,506', u'West Midlands', u'England']
[u'Altrincham and Sale West', u'69,605', u'72,008', u'Greater Manchester', u'England']
[u'Amber Valley', u'66,406', u'69,538', u'Derbyshire', u'England']
[u'Arundel and South Downs', u'71,203', u'76,697', u'West Sussex', u'England']
[u'Ashfield', u'74,674', u'77,049', u'Nottinghamshire', u'England']
[u'Ashford', u'72,501', u'81,947', u'Kent', u'England']
[u'Ashton-under-Lyne', u'67,334', u'68,553', u'Greater Manchester', u'England']
[u'Aylesbury', u'72,023', u'78,750', u'Buckinghamshire', u'England']
...

I wanted to get rid of the 2nd and 3rd columns (containing the electorates) from the array since those aren’t interesting to me as I have another source where I’ve got that data from.

I was struggling to do this but twodifferent Stack Overflow questions came to the rescue with suggestions to use enumerate to get the index of each column and then add to the list comprehension to filter appropriately.

First we’ll look at the filtering on a simple example. Imagine we have a list of 5 people:

people = ["mark", "michael", "brian", "alistair", "jim"]

And we only want to keep the 1st, 4th and 5th people. We therefore only want to keep the values that exist in index positions 0,3 and 4 which we can do like this:

>>> [x[1] for x in enumerate(people) if x[0] in [0,3,4]]
['mark', 'alistair', 'jim']

Now let’s apply the same approach to our constituencies data set:

import requests

from bs4 import BeautifulSoup
from soupselect import select

page = open("constituencies.html", 'r')
soup = BeautifulSoup(page.read())

for row in select(soup, "table.wikitable tr"):
    if select(row, "th"):
        print [entry[1].text for entry in enumerate(select(row, "th")) if entry[0] in [0,3,4]]

    if select(row, "td"):
        print [entry[1].text for entry in enumerate(select(row, "td")) if entry[0] in [0,3,4]]

$ python blog.py
[u'Constituency', u'Largest Local Authority', u'Country of the UK']
[u'Aldershot', u'Hampshire', u'England']
[u'Aldridge-Brownhills', u'West Midlands', u'England']
[u'Altrincham and Sale West', u'Greater Manchester', u'England']
[u'Amber Valley', u'Derbyshire', u'England']
[u'Arundel and South Downs', u'West Sussex', u'England']
[u'Ashfield', u'Nottinghamshire', u'England']
[u'Ashford', u'Kent', u'England']
[u'Ashton-under-Lyne', u'Greater Manchester', u'England']
[u'Aylesbury', u'Buckinghamshire', u'England']

 Scan Java, NuGet, and NPM packages for open source security and license compliance issues. 

Topics:
java ,python ,bigdata ,tips and tricks ,python indexes

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}