Over a million developers have joined DZone.

Getting Started with Data

DZone's Guide to

Getting Started with Data

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

A regular question I get asked is “What materials would you recommend for someone just getting started in a more data oriented job?” In this blog post I’m going to try to give a set of options, both books and websites, that will answer that question.

Where Am I At?

I currently work as a conversion optimization specialist. What that means is I design/run/analyze feature experiments on web sites. The ultimate goal is usually centered around driving more, or larger, purchases. In working with other analysts, I’ve noticed a set of core skills that, when all are present, make the analyst one of my go-tos and that have led me to some success at what I do.

Without Further Ado

  • Website: Github

    • You’re gonna need to learn to code.
    • Search for data, or statistics, or anything, and I bet you find sample code.
  • Book: Think Stats

    • Learn a little Python, learn a little stats. A great primer on using the two together.
    • Might want to cover the statistics reading I’ve outlined first.
  • Book: Head First Data Analysis

    • Descriptive statistics
    • Basic linear regression
    • Establishing a “gut” for data
  • Book: Statistics in Plain English

    • Descriptive statistics
    • Statistical tests (Binomial and t-tests at least)
    • Confidence intervals
    • Linear regression
  • Web Article: How Not to Run an A/B Test

    • Experiment design
    • Dipping your toes into power analysis
  • Book: The Flaw of Averages * Why the most prevalent descriptive statistic, the average, can be a terribly misleading golden hammer in search of a nail.

  • Free Online Class: Probability & Statistics, Carnegie Mellon

    • Probability (including Bayes theorem)
    • Statistics
    • Exploratory data analysis
  • Textbook: A Second Course in Statistics: Regression Analysis (7th Edition)

    • In depth treatment on linear regression.
    • Tons of theory, but focused on learning to use statistical software to do the analysis.
    • Best read after either taking a stats 101 class or learning more about classic statistical tests and how to use them correctly.
  • Technology: R Studio * I have read several books on R, but none of them really helped me much. The best thing has been this program, as it’s made it simple to get data into R and viewable so I can focus on analyzing it.

Currently Reading

This is a selection of books that I’m currently reading and learning from, but may or may not have gotten any results from yet.

  • Textbook: Statistics: A Bayesian Perspective

    • A very approachable introduction to Bayesian Stats. It is exceedingly less dense than some of the other material in this list.
    • Also starts to get into multiply probability distributions, and is very helpful in visualizing them.
  • Textbook: Doing Bayesian Data Analysis

    • Once you’re done with the previous book, this one builds on it quite nicely.
    • Gets down to actually computing more advanced Bayesian statistics problems, including hypothesis tests.
    • Also fairly approachable, but I wouldn’t recommend it to an absolute novice.
  • Textbook: Introduction to Bayesian Statistics

    • Very dense. Currently making my way through this bit by bit.
    • Nice to use after hitting the first book, and great in parallel with “Doing Bayesian Data Analysis."
    • More mathematically focused, but still starts from the basics, so it’s definitely a book that you can use to slowly build up to a more rigorous exploration of Bayesian Statistics.

(Note: This article and the opinions expressed are solely my own and do not represent those of my employer.)

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.


Published at DZone with permission of Justin Bozonier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}