Over a million developers have joined DZone.

Data Analysis Training via Coursera

DZone's Guide to

Data Analysis Training via Coursera

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

I recently finished the coursera course "Data Analysis," which immediately followed and somewhat overlapped "Computing for Data Analysis," also from coursera. The first class was taught by Roger Peng, the second by Jeff Leek, both of Johns Hopkins Bloomberg School of Public Health. These courses were my first exposure to an online course, and they were quite heavily "attended."

I can't say enough in favor of these two courses. I learned a great deal, and and was greatly helped by the quizzes and the projects, by which I'm saying that the desire to get a good grade is still motivating, even after all these years. I recently read a post by someone who mentioned that online courses (and he mentioned coursera specifically) did not help him learn at all because he needed the pressure of a deadline to make himself learn material.  I can't imagine what he was talking about:  in some cases, I spent upwards of 30 hours a week trying to absorb the material, do well on the quizzes, and finish the data analyses.  One weekend I noticed someone posting on the class forum that his marriage was on the rocks if he didn't immediately solve an issue he was having.  :)  As I was reading that post, I myself was working on that same assignment from a cabin in the U.S. Rocky Mountains, instead of skiing.  So if you take an online class only with the goal of watching the videos, maybe you won't learn much.  But my classmates seemed as motivated as I was, and I'm thrilled by the experience.

"Computing for Data Analysis" had about 40 000 students enrolled, while "Data Analysis" had slightly more than 100 000 enrollees.  Some of the advice I would give after taking these courses:

  • Be prepared to watch any particular class video more than once.  I picked this up from a fellow student in the class forums, and I found my performance on the quizzes improved significantly when I made the decision to watch each video at least twice.
  • At least for Roger and Jeff, when the instructor mentions a reference at the end of a lecture, you really need to go check out that reference.  On occasion, I chose not to, especially when it involved a textbook that I wasn't sure I wanted to purchase.  But quite often you will need the information contained in suggested reading material.
  • There are a lot of people in statistics and data analysis who have been extremely generous with their time.  I'm specifically talking about some of the high-quality books on the subject which are available online, for free.  Some of these are also available in print for a nominal fee.  If they're free, you really need to take a look at them.  My favorites areOpenIntro's StatisticsIntroduction to Data Science, and The Elements of Statistical Learning.
  • Unless you are so on top of the course schedule that you take the quizzes and finish the assignments the first day they are made available, it's very likely that any question/issue you have has already been dealt with in the class forums.  I was amazed at how generous my fellow students were with their time, answering questions concerning the assignments.  I would have pitched in and helped, but usually by the time I even started a task, every possible related issue had already been posted and answered.
  • Another thing on the videos -- If you are somewhat new to R (or maybe even if you are not), I recommend working through the examples in the videos as you watch them.  Pause the video and just download the data and replicate the instructor's steps.  It may look simple while you're watching, but actually performing the steps helps you internalize them for later, when you need them.  An unexpected bonus:  if you're following along in RStudio, you can save your workspace and have the video's example work waiting for you when you need it later.
If you plan on taking any course which uses R, as these do, and you are not familiar with R, there's no reason to wait until the class starts to pick up some familiarity with the language.  I had been learning R on the side for a few weeks before I took Roger's course, so I think my preparation helped a little.  Nothing, however, helps like using R all day.  During Roger's course, an analysis task came up in my job that I recognized would be much easier to do with R than with my usual Java, so I used R for that task.  That experience helped me a lot when I started Jeff's class later; I don't even want to think how painful some of that early work would have been if it had not been for the daily experience I had with R before the class started.

Jeff recommended we all look at  kaggle, which I recently joined.  kaggle is a site where you can enter data-analysis competitions and see how well you stack up against your fellow data scientists.  kaggle currently has about 84 000 members.  Do you remember  TopCoder?  It has redefined itself somewhat, now referring to "Design.  Development.  Big Data."  The TopCoder community is "slightly" larger than kaggle, at nearly a half-million members.  I could be wrong, but I would not be surprised if, at some point in the future, obtaining work as a data scientist would require (at a minimum) a presence on both of these sites.  I'm not promoting them, by the way; I'm just thinking they will probably eventually fall into that category of "resume sites" that you can't really can't afford not to have on your resume.

What is next for me?  I'm going to take coursera's " Statistics:  Making Sense of Data", followed by another coursera course, " Introduction to Data Science".  These courses will overlap by a month, making for an interesting few weeks for those like me with full-time jobs.  By the way, that reminds me of a couple of other points.  Frequently, you will want a couple of courses which overlap.  If it's only for a week and your time is at a premium, you should probably be okay.  While the first class will be racing to the finish line, the second class will just be starting up.  However, if they overlap by a month, then you may find yourself taking a day off from work here and there just to keep up.  I should say that many coursera classes advertise an estimate of the amount of time you will need per week for the class.  The actual amount of time not only varies wildly, but may be significantly higher in real life that the quoted number.  For example, "Data Analysis" was quoted at 3 to 5 hours a week.  I probably averaged 8-10 hours a week; the week that the first analysis assignment was due, I easily cleared 30 hours.  In one week.  So be prepared!

None of what I've said should be interpreted as cautionary or to discourage you (after all, you can take a class and choose just to watch the videos).  I am happy with my personal results and am enthusiastically looking forward to taking two more classes, even with some apprehension about the 4-week overlap.  And I'm not just promoting coursera, either; it's just that coursera is my only experience to-date.  I hope some of the above information helps you on your path to becoming a data scientist!

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.


Published at DZone with permission of Wayne Adams, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}