Algorithm of the Week: A Look At Coursera's Design and Analysis of Algorithms: Part I
When I went to university (M.Sc. in Computer Science and Engineering), I took both an algorithm course and a data structures course, so a lot of the material in this course wasn’t new to me. However, that was over 20 years ago, so I thought it would be good with a refresher course. There were also several topics that were new to me, in particular some of the graph algorithms (case in point: Karger’s min-cut algorithm presented in the course had not been invented when I went to university).
The course lasted 5 weeks and consisted of video lectures, multiple-choice quizzes and programming problems. The topics covered were:
- Merge sort
- Big-O Notation
- Algorithm for counting inversions
- Strassen’s sub-cubic matrix multiplicaiton
- The Master Method
- Randomized selection
- Graphs and minimum cuts
- Graph search – Breadth first and Depth first
- Topological sort
- Graphs – Strogly connected components
- Dijkstra’s shortest-path algorithm
- Data structures: heaps
- Data structures: hash tables
The video lectures (about 2 hours per week) were very good and easy to follow, and Professor Roughgarden is quite good at explaining the different concepts and algorithms. My only wish is that I had the option of reading the material (as presented in a text book) instead of watching it. The slides used in the videos are available for download, but they don’t have enough information to be read on their own.
For me it is faster to read than to listen to someone explaining something, it is easier to skim things I already know, and it is easier to go back and forth in a text when something isn’t clear. Also, I like the visual structure of the material on the pages – something I don’t get from a video. But the main reasons is that it would be faster. Taking the course is quite a big time commitment, especially when you work full time and have a family, and if things can be sped up it’s a big plus.
This is a of course a matter of personal taste, and I suspect many people prefer having the material presented in a lecture format. Also, it is possible to speed up the videos (1.25 or 1.5 times the normal speed), which I did make use of sometimes. But for me it still does not beat reading from a book.
The topics I already knew quite well were merge sort, big-O notation, quicksort, and hash tables (in particular hash tables – I use them almost daily when coding). I thought all of these topics were well presented.
I had never heard about the Master Method before, but found both the formula (for finding the big-O performance of recursive algorithms) and the derivation of it quite interesting. I also liked the graph algorithms, in particular the algorithm for finding strongly connected components in a directed graph (the algorithm uses the neat trick of reversing all the edges as one of its steps). The lecture on heaps was interesting – I studied heaps in my class at university, but I had completely forgotten about them. So there it really was a case of needing a refresher (and heap sort now makes total sense too).
There were 5 multiple-choice quizzes, with 26 questions in total, and you were only allowed 2 tries per quiz (your best attempt counts towards your total score). This is in contrast to the quizzes in the database course, where you basically had unlimited attempts (the limit was set to 100). However, the database quizzes had many wrong answers, and often several correct answers, and for each new attempt, the answer options were randomly generated (and shuffled). Thus, even though you had seen the question before, you still had to work it through (you couldn’t just rely on remembering the correct answer from a previous attempt).
The questions were quite challenging, and even though I got them all correct in the end, I used my two attempts on all of them. When comparing the two approaches, I actually prefer the database course way with unlimited attempts. Then you are motivated to keep trying, even if you don’t get them all right on the first two tries.
There were also 5 programming assignments, and they were all great for really understanding the different algorithms. You could use any language you wanted (I used Java).
The first one dealt with inversions in an integer array. An inversion is a pair of two array indices, i and j with i<j, such that the value in the array at position i is greater than the value in the array at position j. If the array is sorted there are no inversions, and for all other cases there is at least one inversion. The problem consisted of an array of size 100,000, with all the numbers from 1 to 100,000 appearing exactly once, and the program had to calculate how many inversions the given array contained. This problem could be solved with an algorithm closely related to merge sort.
For the second programming problem, you had to implement quicksort, and use it to sort an integer array of size 10,000. You had to answer how many comparisons were used when sorting the array. The number of comparisons depends on the pivot elements used, and there were 3 cases to investigate: using the first element of the sub-array as the pivot, using the last element of the sub-array as the pivot, and using the median of the first, last and middle element of the sub-array as the pivot. Not surprisingly, using the median element resulted in the fewest number of comparisons.
In the third problem, you implemented the randomized contraction algorithm to find the minimum cut of a 40-node undirected graph. This was an interesting problem where you had to implement “folding” of two nodes into one, and handle the resulting change in edge structure. Unfortunately, unlike the previous two problems, the range of possible answers was very low, so it was possible to guess the answer (since 8 attempts were allowed).
The fourth programming problem was by far the most difficult problem, judging from the discussions in the forum. You had to find the sizes of the five largest strongly connected components of a directed graph consisting of almost one million nodes. A strongly connected component is a group of nodes where there is a path from each node in the component to all the other nodes in that component. It has a nice recursive solution, but a lot of people had all sorts of problems due to the size of the graph, both speed problems and stack overflows.
The last programming assignment made use of a hash table (which you didn’t have to implemet yourself) to see if 9 given target sums could be found by adding two numbers from a list of 100,000 integers. Quite a neat use of a hash table, and also the easiest of the five programming problems.
As with the database course, this course consisted of three components: video lectures, multiple-choice quizzes and programming assignments. It’s a good model that makes sure you learn the material well, and it was really well done in both courses. Another aspect of these on-line courses is that they follow a schedule – new videos are added each week, and quizzes and assignments are due each week. That system works really well to keep you going. The main difference in the courses was the length – the algorithm course was considerably shorter. For me, that was actually better, because you need to put in quite a few hours of work each week, and a shorter course is more managable. Part two of the course will apparently be offered in the fall of 2012.
Even though the course required quite a bit of work, I didn’t find it particularly difficult (I ended up getting full marks on it) – mostly because the teaching and exercises are first rate. I learned a lot, both from this course and from the database course, and I highly recommend both of them.