Performance "Tuning": Running in 1/100th the Time
The Performance Zone is presented by AppDynamics. Scalability and better performance are constant concerns for the developer and operations manager. Try AppDynamics' fully-featured performance tool for Java, .NET, PHP, & Node.js.
It was described as horribly slow. Since it was only 400 lines of code, it was a great subject for review in a Python meetup. I would be able to show some tweaks and performance tips.
Hypothetical Step 1: Profile
My first thought was to run the profiler and see what popped up as a possible root cause of slowness.
Generally, there are three kinds of performance problems.
- Wrong data structure. Or wrong algorithm. (They're simply opposite sides of the same coin.) This generally leads to dramatic, earth-shattering improvements.
- Piss-Poor Programming Practices. This kind of fine-tuning yields minuscule improvements. In some cases, no measurable change at all.
- Bugs. Every time I've been asked to improve "working" code, I find that it has bugs. Every time. These can exacerbate performance problems. Interestingly, most of these are readily apparent during an initial survey: i.e., while simply reading the code to see how it works. Trying to create unit tests for the purposes of refactoring often reveals additional bugs.
In order to work out team rankings, the program kept a list of teams.
Not a dict mapping from team name to team details. But a simple list. Locating a team in the list meant iterating through the list, looking for the matching team. Really.
for t in team_list: if t['name'] == target_name: process( t )
for t in team_list: if t['name'] == target_name: return else: t['name']= init_new_team(target_name)
The whole point of the study of "Data Structures" is to prevent (or optimize) search.
In this case, we can prevent linear search of a list by using a dictionary. That, clearly, is job one.
One of the big "issues" is the use of if statements throughout the scoring algorithm. An if statement creates Cyclomatic Complexity and can lead to performance problems. Generally, if statements should be avoided.
This algorithm applies some normalization factors to reconcile scoring with win/loss numbers in different sports. Basketball, specifically, involves generally high scores. Since there are 2-point and 3-point scoring opportunities, a factor is used to normalize the points into "goals". Football, similarly, has numerous scoring opportunities with values of 1, 2, 3 and 6 points; the scores here are also normalized.
This normalization was done with an if statement that was evaluated deep inside the Elo algorithm. Repeatedly. Evaluated.
The two functions that handled the normalizations, plus the normalization factors, are ideal candidates for OO design. There's a clear hierarchy of classes here. A superclass handles most sports, and two polymorphic subclasses handle football and basketball normalization.
The if statement is now "pushed up" to the very beginning of the program where an instance of the sports normalization object is created. This object's methods are then used by the Elo algorithm to normalize scores.
Icing on the Cake
Once we've fixed the bug and replaced a list with a dict, everything else is merely icing.
Some other OO changes.
- The "Team" information should not be a flat, anonymous dictionary. It should be a proper class definition with proper attributes. There aren't many methods, so it's easy to create.
- The "Game" information is read by csv.DictReader. However, it should not remain a simple, anonymous dict. As with a Team, a simple class can be created to handle Game.
- The overall structure of the application needs to be broken into two sections. The command-line interface parses options, opens files, and generally gets everything set up. The actual ranking algorithm should be a function that is given an open file-like object plus the Sport object for normalization. This allows the ranking algorithm to be reused in other contexts than the command-line (i.e. a web service).
This recreational application took 45-60 seconds to process one year's record of games for a given league. It now takes less than 0.2 seconds to do the same volume of work. Two test cases involving a complete run of 159 records runs in 0.411 seconds. That's 1/100th the time simply from switching data structures.
- Remove searches.
- Remove deeply-nested if statements.