Over a million developers have joined DZone.

Big Data Zone: Best of the Week (Apr. 19-26)

DZone's Guide to

Big Data Zone: Best of the Week (Apr. 19-26)

· Big Data Zone
Free Resource

Free O'Reilly eBook: Learn how to architect always-on apps that scale. Brought to you by Mesosphere DC/OS–the premier platform for containers and big data.

In case you missed them, here is a curated list of the best articles of this week from The Big Data Zone. This week: Randomization and probabilistic techniques to scale up machine learning, creating a skewed random discrete distribution in Python, Bayes factors versus P-values, a new Python podcast, and the Big Data Challenge.

1. Randomization and Probabilistic Techniques to Scale Up Machine Learning

These are only a few instances of probabilistic bounds being applied to solve real world machine learning problems. There are a lots more. In fact I find that scalability of machine learning has a very direct correlation with application of probabilistic techniques to the model. As I mentioned earlier the point of this post is to share some of my thoughts as I continue to learn techniques to scale up machine learning models.

2. Python: Creating a Skewed Random Discrete Distribution

I’m planning to write a variant of the TF/IDF algorithm over the HIMYM corpus which weights in favour of term that appear in a medium number of documents and as a prerequisite needed a function that when given a number of documents would return a weighting.

3. Bayes Factors vs P-values

Bayesian analysis and Frequentist analysis often lead to the same conclusions by different routes. But sometimes the two forms of analysis lead to starkly different conclusions. The following illustration of this difference comes from a talk by Luis Pericci last week. He attributes the example to “Bernardo (2010)” though I have not been able to find the exact reference.

4. Announcing New Podcast: Talk Python to Me

I’m super excited to announce that I just launched a brand new podcast for Python developers called Talk Python To Me. This weekly podcast already has the first episode published and some amazing guests lined up.

5. The Big Data Challenge

Big Data analytics is now more towards accurately defining data, uniform handling and developing data driven smart products.

Easily deploy & scale your data pipelines in clicks. Run Spark, Kafka, Cassandra + more on shared infrastructure and blow away your data silos. Learn how with Mesosphere DC/OS.

bigdata ,big data ,best of the week

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}