In case you missed them, here is a curated list of of the best articles from this week's edition DZone's Big Data Zone. This is a week heavy in R! We have R-related posts about numeric keys in a nested list/dictionary, a Markov chain example using Wikipedia, and solving a locomotive problem in R with posterior probabilities for different priors. Also, growing some trees (in a mathematical sense), and how to choose a Hadoop distro [video].
Last week I described how I’ve been creating fake dictionaries in R using lists and I found myself using the same structure while solving the dice problem in Think Bayes.
Over the weekend I’ve been reading about Markov Chains and I thought it’d be an interesting exercise for me to translate Wikipedia’s example into R code.
This is my first video post. Rather than writing I simply turned on Camtasia to see how things would work out. If you are interested in a short primer on how to choose a Hadoop distro this video is for you!
Consider here the dataset used in a previous post, about visualising a classification (with more than 2 features). We can change the options here, such as the minimum number of observations, per node. To visualize that classification, use the following code (to get a projection on the first two components).
In my continued reading of Think Bayes the next problem to tackle is the Locomotive problem. The interesting thing about this question is that it initially seems that we don’t have enough information to come up with any sort of answer. However, we can get an estimate if we come up with a prior to work with.