Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Big Data Zone: Best of the Week (Apr. 19-26)

DZone's Guide to

Big Data Zone: Best of the Week (Apr. 19-26)

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

In case you missed them, here is a curated list of the best articles of this week from The Big Data Zone. This week: Randomization and probabilistic techniques to scale up machine learning, creating a skewed random discrete distribution in Python, Bayes factors versus P-values, a new Python podcast, and the Big Data Challenge.

1. Randomization and Probabilistic Techniques to Scale Up Machine Learning

These are only a few instances of probabilistic bounds being applied to solve real world machine learning problems. There are a lots more. In fact I find that scalability of machine learning has a very direct correlation with application of probabilistic techniques to the model. As I mentioned earlier the point of this post is to share some of my thoughts as I continue to learn techniques to scale up machine learning models.

2. Python: Creating a Skewed Random Discrete Distribution

I’m planning to write a variant of the TF/IDF algorithm over the HIMYM corpus which weights in favour of term that appear in a medium number of documents and as a prerequisite needed a function that when given a number of documents would return a weighting.

3. Bayes Factors vs P-values

Bayesian analysis and Frequentist analysis often lead to the same conclusions by different routes. But sometimes the two forms of analysis lead to starkly different conclusions. The following illustration of this difference comes from a talk by Luis Pericci last week. He attributes the example to “Bernardo (2010)” though I have not been able to find the exact reference.

4. Announcing New Podcast: Talk Python to Me

I’m super excited to announce that I just launched a brand new podcast for Python developers called Talk Python To Me. This weekly podcast already has the first episode published and some amazing guests lined up.

5. The Big Data Challenge

Big Data analytics is now more towards accurately defining data, uniform handling and developing data driven smart products.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
bigdata ,big data ,best of the week

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}