Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Python: Selecting Certain Indexes in an Array

DZone's Guide to

Python: Selecting Certain Indexes in an Array

· Java Zone
Free Resource

Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code! Brought to you in partnership with ZeroTurnaround.

A couple of days ago I was scrapping the UK parliament constituencies from Wikipedia in preparation for the Graph Connect hackathon and had got to the point where I had an array with one entry per column in the table.

import requests

from bs4 import BeautifulSoup
from soupselect import select

page = open("constituencies.html", 'r')
soup = BeautifulSoup(page.read())

for row in select(soup, "table.wikitable tr"):
    if select(row, "th"):
        print [cell.text for cell in select(row, "th")]

    if select(row, "td"):
        print [cell.text for cell in select(row, "td")]
$ python blog.py
[u'Constituency', u'Electorate (2000)', u'Electorate (2010)', u'Largest Local Authority', u'Country of the UK']
[u'Aldershot', u'66,499', u'71,908', u'Hampshire', u'England']
[u'Aldridge-Brownhills', u'58,695', u'59,506', u'West Midlands', u'England']
[u'Altrincham and Sale West', u'69,605', u'72,008', u'Greater Manchester', u'England']
[u'Amber Valley', u'66,406', u'69,538', u'Derbyshire', u'England']
[u'Arundel and South Downs', u'71,203', u'76,697', u'West Sussex', u'England']
[u'Ashfield', u'74,674', u'77,049', u'Nottinghamshire', u'England']
[u'Ashford', u'72,501', u'81,947', u'Kent', u'England']
[u'Ashton-under-Lyne', u'67,334', u'68,553', u'Greater Manchester', u'England']
[u'Aylesbury', u'72,023', u'78,750', u'Buckinghamshire', u'England']
...

I wanted to get rid of the 2nd and 3rd columns (containing the electorates) from the array since those aren’t interesting to me as I have another source where I’ve got that data from.

I was struggling to do this but twodifferent Stack Overflow questions came to the rescue with suggestions to use enumerate to get the index of each column and then add to the list comprehension to filter appropriately.

First we’ll look at the filtering on a simple example. Imagine we have a list of 5 people:

people = ["mark", "michael", "brian", "alistair", "jim"]

And we only want to keep the 1st, 4th and 5th people. We therefore only want to keep the values that exist in index positions 0,3 and 4 which we can do like this:

>>> [x[1] for x in enumerate(people) if x[0] in [0,3,4]]
['mark', 'alistair', 'jim']

Now let’s apply the same approach to our constituencies data set:

import requests

from bs4 import BeautifulSoup
from soupselect import select

page = open("constituencies.html", 'r')
soup = BeautifulSoup(page.read())

for row in select(soup, "table.wikitable tr"):
    if select(row, "th"):
        print [entry[1].text for entry in enumerate(select(row, "th")) if entry[0] in [0,3,4]]

    if select(row, "td"):
        print [entry[1].text for entry in enumerate(select(row, "td")) if entry[0] in [0,3,4]]

$ python blog.py
[u'Constituency', u'Largest Local Authority', u'Country of the UK']
[u'Aldershot', u'Hampshire', u'England']
[u'Aldridge-Brownhills', u'West Midlands', u'England']
[u'Altrincham and Sale West', u'Greater Manchester', u'England']
[u'Amber Valley', u'Derbyshire', u'England']
[u'Arundel and South Downs', u'West Sussex', u'England']
[u'Ashfield', u'Nottinghamshire', u'England']
[u'Ashford', u'Kent', u'England']
[u'Ashton-under-Lyne', u'Greater Manchester', u'England']
[u'Aylesbury', u'Buckinghamshire', u'England']

The Java Zone is brought to you in partnership with ZeroTurnaround. Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code!

Topics:
java ,python ,bigdata ,tips and tricks ,python indexes

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}