Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Comparing Strikers Statistics

DZone's Guide to

Comparing Strikers Statistics

· Web Dev Zone ·
Free Resource

Learn how error monitoring with Sentry closes the gap between the product team and your customers. With Sentry, you can focus on what you do best: building and scaling software that makes your users’ lives better.

Here we compare the scoring statistics of four of the best strikers of the recent football history: Del Piero, Trezeguet, Ronaldo and Vieri. The statistics that we will look at are the scoring trajectory, scoring rate and number of appearances.

To compute these values we need to scrape the career statistics (number of goals and appearances per season) on the Wikipedia pages of the players:

from bs4 import BeautifulSoup
from urllib2 import urlopen

def get_total_goals(url):
    """
    Given the url of a wikipedia page about a football striker
    returns three numy arrays:
    - years, each element corresponds to a season
    - apprearances, contains the number of appearances each season
    - goals, contains the number of goal scored each season
    
    Unfortunately this function is able to parse 
    only the pages of few strikers.
    """
    soup = BeautifulSoup(urlopen(url).read())
    table = soup.find("table", { "class" : "wikitable" })
    years = []
    apps = []
    goals = []
    for row in table.findAll("tr"):
        cells = row.findAll("td")
        if len(cells) > 1:
            years.append(int(cells[0].text[:4]))
            apps.append(int(cells[len(cells)-2].text))
            goals.append(int(cells[len(cells)-1].text))
    return np.array(years), 
           np.array(apps, dtype='float'), 
           np.array(goals)

ronaldo = get_total_goals('http://en.wikipedia.org/wiki/Ronaldo')
vieri = get_total_goals('http://en.wikipedia.org/wiki/Christian_Vieri')
delpiero = get_total_goals('http://en.wikipedia.org/wiki/Alessandro_Del_Piero')
trezeguet = get_total_goals('http://en.wikipedia.org/wiki/David_Trezeguet')

Now we are ready to compute our statistics. For each statistics we will produce an interactive chart using http://glowingpython.blogspot.com/search/label/plotly">plotly.

Scoring trajectory

import plotly.plotly as py
from plotly.graph_objs import *
py.sign_in("sexyusername", "mypassword")

data = Data([
    Scatter(x=delpiero[0],y=cumsum(delpiero[2]), 
            name='Del Piero', mode='lines'),
    Scatter(x=trezeguet[0],y=cumsum(trezeguet[2]), 
            name='Trezeguet', mode='lines'),
    Scatter(x=ronaldo[0],y=cumsum(ronaldo[2]), 
            name='Ronaldo', mode='lines'),
    Scatter(x=vieri[0],y=cumsum(vieri[2]), 
            name='Vieri', mode='lines'),
])

layout = Layout(
    title='Scoring Trajectory',
    xaxis=XAxis(title='Year'),
    yaxis=YAxis(title='Cumuative goal'),
    legend=Legend(x=0.0,y=1.0))

fig = Figure(data=data, layout=layout)

py.iplot(fig, filename='cumulative-goals')

The scoring trajectory is given by the yearly cumulative totals of goals scored. From the scoring trajectories we can see that Ronaldo was a goal machine since his first professional season and his worse period was from 1999 to 2001. Del Piero and Trezeguet have the longest careers (and they're still playing!). Vieri had the shortest career but it's impressive to see that the number of goals he scored increased almost constantly from 1996 to 2004.

Scoring rate

data = Data([
    Bar(
        x=['Ronaldo', 'Vieri', 'Trezeguet', 'Del Piero'],
        y=[np.sum(ronaldo[2])/np.sum(ronaldo[1]), 
           np.sum(vieri[2])/np.sum(vieri[1]),
           np.sum(trezeguet[2])/np.sum(trezeguet[1]),
           np.sum(delpiero[2])/np.sum(delpiero[1])]
    )
])
py.iplot(data, filename='goal-average')

The scoring rate is the number of goals scored divided by the number of appearances. Ronaldo has a terrific 0.67 scoring rate, meaning that, on average he scored more than three goals each five games. Vieri and Trezeguet have a very similar scoring rate, almost one goal each two games. While Del Piero has 0.40, two goals each five games.

Appearances

data = Data([
    Bar(
        x=['Del Piero', 'Trezeguet', 'Ronaldo', 'Vieri'],
        y=[np.sum(delpiero[1]),
           np.sum(trezeguet[1]),
           np.sum(ronaldo[1]),
           np.sum(vieri[1])]
    )
])
py.iplot(data, filename='appearances')

The number of Del Piero's appearances on a football field is impressive. At the moment I'm writing, he played 773 games. No one of the other players was able to play the 70% of the games played by the Italian numero 10.

What’s the best way to boost the efficiency of your product team and ship with confidence? Check out this ebook to learn how Sentry's real-time error monitoring helps developers stay in their workflow to fix bugs before the user even knows there’s a problem.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}