DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Presentation: Scalability Challenges in Big Data Science

Presentation: Scalability Challenges in Big Data Science

Mikio Braun user avatar by
Mikio Braun
·
Jun. 19, 12 · Big Data Zone · Interview
Like (0)
Save
Tweet
6.50K Views

Join the DZone community and get the full member experience.

Join For Free

Scalability Challenges in Big Data Science

Yesterday I gave a talk on scalability and machine learning at the BerlinBuzzword conference. I give an overview of different ways to scale data analysis and machine learning methods. I cover MapReduce (of course), large scale training of SVMs via stochastic gradient descent, but also stream mining, and real-time (as you know, “you don’t just scale into real-time”).

The conference continues today, follow the conference on Twitter on the #bbuzz hashtag.

Update: On scribd, the hyperlinks are somehow lost, so here is the list:

Scalable Databases

  • Cassandra
  • riak
  • mongoDB
  • MySQL

Multithreadding and Messaging Frameworks

  • ActiveMQ
  • ZeroMQ
  • akka

MapReduce

  • hadoop
  • disco
  • The MapReduce paper: Map-Reduce for Machine Leanring on Multicore

Large Scale Classifier Training

  • Vowpal Wabbit

Other frameworks

  • Pregel paper: Pregel: A System for Large-Scale Graph Processing
  • Storm
  • Esper

Stream processing

  • Alex Smola’s lecture
  • Heavy hitter’s paper Efficient computation of Frequent and Top-k Elements in Data Streams
  • Hashing paper Feature Hashing for Large Scale Multitask Learning
  • Count-min sketches website
  • Stream Mining Clustering paper: A Framework for Clustering Massive-Domain Data Streams
  • My post One does not simply scale into real-time

TWIMPACT:

  • http://twimpact.com
  • http://serienradar.de
  • http://datascience-berlin.de

 

Data science Big data Machine learning Scalability

Published at DZone with permission of Mikio Braun, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How to Perform Visual Regression Testing Using Cypress
  • Best Practices for Resource Management in PrestoDB
  • How To Deploy Apache Kafka With Kubernetes
  • How to Handle Early Startup Technical Debt (Or Just Avoid it Entirely)

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo