Over a million developers have joined DZone.

Big Data Zone: Best of the Week (Apr. 19-26)

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

In case you missed them, here is a curated list of the best articles of this week from The Big Data Zone. This week: Randomization and probabilistic techniques to scale up machine learning, creating a skewed random discrete distribution in Python, Bayes factors versus P-values, a new Python podcast, and the Big Data Challenge.

1. Randomization and Probabilistic Techniques to Scale Up Machine Learning

These are only a few instances of probabilistic bounds being applied to solve real world machine learning problems. There are a lots more. In fact I find that scalability of machine learning has a very direct correlation with application of probabilistic techniques to the model. As I mentioned earlier the point of this post is to share some of my thoughts as I continue to learn techniques to scale up machine learning models.

2. Python: Creating a Skewed Random Discrete Distribution

I’m planning to write a variant of the TF/IDF algorithm over the HIMYM corpus which weights in favour of term that appear in a medium number of documents and as a prerequisite needed a function that when given a number of documents would return a weighting.

3. Bayes Factors vs P-values

Bayesian analysis and Frequentist analysis often lead to the same conclusions by different routes. But sometimes the two forms of analysis lead to starkly different conclusions. The following illustration of this difference comes from a talk by Luis Pericci last week. He attributes the example to “Bernardo (2010)” though I have not been able to find the exact reference.

4. Announcing New Podcast: Talk Python to Me

I’m super excited to announce that I just launched a brand new podcast for Python developers called Talk Python To Me. This weekly podcast already has the first episode published and some amazing guests lined up.

5. The Big Data Challenge

Big Data analytics is now more towards accurately defining data, uniform handling and developing data driven smart products.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
bigdata ,big data ,best of the week

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}