Over a million developers have joined DZone.

Data Science Fun at Velocity Amsterdam

Check out this summary of Bart Devylder's Python data science workshop in Amsterdam and iPython notebooks.

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Last week I gave a hands-on Python data science workshop at Velocity Amsterdam together with my colleague Pieter Buteneers. The purpose was to introduce techniques for visualizing large datasets, finding correlations between metrics, applying machine learning, anomaly detection, and data forecasting . With 54 active participants and quite some positive feedback, I think it was a success (but I might be biased) and I would like to share some of our experiences and the tools we used.

IMG_20161109_144923676.jpg

One of the challenges we faced in preparing for the workshop was to find a convenient way to let everyone participate without having to worry about whether they had a compatible version of the Python data science stack installed on their laptops. We decided to give the tutorial using an iPython notebook, which runs in the browser and allows you to execute code and show graphical output. This opened up the possibility to relieve the participants from installing anything, given we provided a server running these notebooks on behalf of the users. 

Screenshot from 2016-11-17 09-15-49.png 
One very promising service that offers this is mybinder.org, which spins up fully functional notebooks based on any public GitHub repository. It requires no setup whatsoever as the notebooks run on a Kubernetes cluster of the Freeman lab. However, for the purpose of running the tutorial for a relatively big audience at a very specific time, we felt it was too risky to rely on it. We had no way to know how much capacity would be available when we needed it, and we also had observed occasional downtime.

Therefore we decided to use JupyterHub, a service which dynamically spins up Jupyter notebook servers for each user. The installation and configuration went pretty smooth thanks to the many documented examples. We installed it on a rather heavy machine on Azure (as it was configured to run everything locally) and it could serve the load  well during the workshop.

If you would like to (re)try the tutorial, you can check out the code on GitHub or directly jump to a fully functional notebook offered by myBinder (most of the time). Have fun, and let us know your feedback!

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
data science ,python ,machine learning ,big data

Published at DZone with permission of Bart Devylder, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}