Presentation: Scalability Challenges in Big Data Science
Join the DZone community and get the full member experience.
Join For FreeScalability Challenges in Big Data Science
Yesterday I gave a talk on scalability and machine learning at the BerlinBuzzword conference. I give an overview of different ways to scale data analysis and machine learning methods. I cover MapReduce (of course), large scale training of SVMs via stochastic gradient descent, but also stream mining, and real-time (as you know, “you don’t just scale into real-time”).
The conference continues today, follow the conference on Twitter on the #bbuzz hashtag.
Update: On scribd, the hyperlinks are somehow lost, so here is the list:
Scalable Databases
Multithreadding and Messaging Frameworks
MapReduce
- hadoop
- disco
- The MapReduce paper: Map-Reduce for Machine Leanring on Multicore
Large Scale Classifier Training
Other frameworks
Stream processing
- Alex Smola’s lecture
- Heavy hitter’s paper Efficient computation of Frequent and Top-k Elements in Data Streams
- Hashing paper Feature Hashing for Large Scale Multitask Learning
- Count-min sketches website
- Stream Mining Clustering paper: A Framework for Clustering Massive-Domain Data Streams
- My post One does not simply scale into real-time
TWIMPACT:
Published at DZone with permission of Mikio Braun, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments