Data Analysis Training via Coursera
I recently finished the coursera course "Data Analysis," which immediately followed and somewhat overlapped "Computing for Data Analysis," also from coursera. The first class was taught by Roger Peng, the second by Jeff Leek, both of Johns Hopkins Bloomberg School of Public Health. These courses were my first exposure to an online course, and they were quite heavily "attended."
I can't say enough in favor of these two courses. I learned a great deal, and and was greatly helped by the quizzes and the projects, by which I'm saying that the desire to get a good grade is still motivating, even after all these years. I recently read a post by someone who mentioned that online courses (and he mentioned coursera specifically) did not help him learn at all because he needed the pressure of a deadline to make himself learn material. I can't imagine what he was talking about: in some cases, I spent upwards of 30 hours a week trying to absorb the material, do well on the quizzes, and finish the data analyses. One weekend I noticed someone posting on the class forum that his marriage was on the rocks if he didn't immediately solve an issue he was having. :) As I was reading that post, I myself was working on that same assignment from a cabin in the U.S. Rocky Mountains, instead of skiing. So if you take an online class only with the goal of watching the videos, maybe you won't learn much. But my classmates seemed as motivated as I was, and I'm thrilled by the experience.
"Computing for Data Analysis" had about 40 000 students enrolled, while "Data Analysis" had slightly more than 100 000 enrollees. Some of the advice I would give after taking these courses:
- Be prepared to watch any particular class video more than once. I picked this up from a fellow student in the class forums, and I found my performance on the quizzes improved significantly when I made the decision to watch each video at least twice.
- At least for Roger and Jeff, when the instructor mentions a reference at the end of a lecture, you really need to go check out that reference. On occasion, I chose not to, especially when it involved a textbook that I wasn't sure I wanted to purchase. But quite often you will need the information contained in suggested reading material.
- There are a lot of people in statistics and data analysis who have been extremely generous with their time. I'm specifically talking about some of the high-quality books on the subject which are available online, for free. Some of these are also available in print for a nominal fee. If they're free, you really need to take a look at them. My favorites areOpenIntro's Statistics, Introduction to Data Science, and The Elements of Statistical Learning.
- Unless you are so on top of the course schedule that you take the quizzes and finish the assignments the first day they are made available, it's very likely that any question/issue you have has already been dealt with in the class forums. I was amazed at how generous my fellow students were with their time, answering questions concerning the assignments. I would have pitched in and helped, but usually by the time I even started a task, every possible related issue had already been posted and answered.
- Another thing on the videos -- If you are somewhat new to R (or maybe even if you are not), I recommend working through the examples in the videos as you watch them. Pause the video and just download the data and replicate the instructor's steps. It may look simple while you're watching, but actually performing the steps helps you internalize them for later, when you need them. An unexpected bonus: if you're following along in RStudio, you can save your workspace and have the video's example work waiting for you when you need it later.
Jeff recommended we all look at kaggle, which I recently joined. kaggle is a site where you can enter data-analysis competitions and see how well you stack up against your fellow data scientists. kaggle currently has about 84 000 members. Do you remember TopCoder? It has redefined itself somewhat, now referring to "Design. Development. Big Data." The TopCoder community is "slightly" larger than kaggle, at nearly a half-million members. I could be wrong, but I would not be surprised if, at some point in the future, obtaining work as a data scientist would require (at a minimum) a presence on both of these sites. I'm not promoting them, by the way; I'm just thinking they will probably eventually fall into that category of "resume sites" that you can't really can't afford not to have on your resume.
What is next for me? I'm going to take coursera's "Statistics: Making Sense of Data", followed by another coursera course, "Introduction to Data Science". These courses will overlap by a month, making for an interesting few weeks for those like me with full-time jobs. By the way, that reminds me of a couple of other points. Frequently, you will want a couple of courses which overlap. If it's only for a week and your time is at a premium, you should probably be okay. While the first class will be racing to the finish line, the second class will just be starting up. However, if they overlap by a month, then you may find yourself taking a day off from work here and there just to keep up. I should say that many coursera classes advertise an estimate of the amount of time you will need per week for the class. The actual amount of time not only varies wildly, but may be significantly higher in real life that the quoted number. For example, "Data Analysis" was quoted at 3 to 5 hours a week. I probably averaged 8-10 hours a week; the week that the first analysis assignment was due, I easily cleared 30 hours. In one week. So be prepared!
None of what I've said should be interpreted as cautionary or to discourage you (after all, you can take a class and choose just to watch the videos). I am happy with my personal results and am enthusiastically looking forward to taking two more classes, even with some apprehension about the 4-week overlap. And I'm not just promoting coursera, either; it's just that coursera is my only experience to-date. I hope some of the above information helps you on your path to becoming a data scientist!