Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Big Data Zone: Best of the Week (Apr. 19-26)

DZone's Guide to

Big Data Zone: Best of the Week (Apr. 19-26)

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

In case you missed them, here is a curated list of the best articles of this week from The Big Data Zone. This week: Randomization and probabilistic techniques to scale up machine learning, creating a skewed random discrete distribution in Python, Bayes factors versus P-values, a new Python podcast, and the Big Data Challenge.

1. Randomization and Probabilistic Techniques to Scale Up Machine Learning

These are only a few instances of probabilistic bounds being applied to solve real world machine learning problems. There are a lots more. In fact I find that scalability of machine learning has a very direct correlation with application of probabilistic techniques to the model. As I mentioned earlier the point of this post is to share some of my thoughts as I continue to learn techniques to scale up machine learning models.

2. Python: Creating a Skewed Random Discrete Distribution

I’m planning to write a variant of the TF/IDF algorithm over the HIMYM corpus which weights in favour of term that appear in a medium number of documents and as a prerequisite needed a function that when given a number of documents would return a weighting.

3. Bayes Factors vs P-values

Bayesian analysis and Frequentist analysis often lead to the same conclusions by different routes. But sometimes the two forms of analysis lead to starkly different conclusions. The following illustration of this difference comes from a talk by Luis Pericci last week. He attributes the example to “Bernardo (2010)” though I have not been able to find the exact reference.

4. Announcing New Podcast: Talk Python to Me

I’m super excited to announce that I just launched a brand new podcast for Python developers called Talk Python To Me. This weekly podcast already has the first episode published and some amazing guests lined up.

5. The Big Data Challenge

Big Data analytics is now more towards accurately defining data, uniform handling and developing data driven smart products.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
bigdata ,big data ,best of the week

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}