DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Hadoop lets you store everything; with Lucene/Solr and more

Hadoop lets you store everything; with Lucene/Solr and more

David Fishman user avatar by
David Fishman
·
Oct. 28, 11 · Interview
Like (0)
Save
Tweet
Share
10.58K Views

Join the DZone community and get the full member experience.

Join For Free

This month’s Wired Magazine features a story on the roots of Hadoop at Yahoo and the three companies vying to drive its commercial frontiers farther forward faster: Hortonworks (Apache Lucene Eurocon Barcelona Keynote Video now available, see below), MapR, and Cloudera. MapR CEO John Schroeder sums it up:

If I can get a terabyte drive for $100 — or less if I buy in bulk — and I can get cheap processing power and network bandwidth to get to that drive, why wouldn’t I just just keep everything?” he says. “Hadoop lets you keep all your raw data and ask questions of it in the future.”


Yahoo, while otherwise lamented in the press for its business model woes, has done this with an array of applications from spam-hunting (retraining the model every few hours) to auto-categorization and user  content mapping, running 5 million jobs a month across over 40 thousand servers and 170 petabytes of storage (a mere $17M worth of disk, enough to keep at most maybe a half-dozen enterprise storage sales guys busy. Multi-billion enterprise storage companies are in a tizzy). With the leverage this affords, it’s no surprise that Ebay has increased their Hadoop footprint 5x to over 2500 servers in the last year. Nor it is surprising that Eric Baldeschwieler, Keynote speaker at Apache Lucene Eurocon 2011 in Barcelona last week, predicts that 50% of the world’s data will be stored on Hadoop within 5 years:

KEYNOTE: Architecting the Future of Big Data & Search, Eric Baldeschwieler, Hortonworks CEO|Apache Lucene Eurocon Barcelona 2011 from Lucene Revolution on Vimeo.


So step one: store it all, and map/reduce to your heart’s content, cranking through key-value abstractions that produce insights you just couldn’t get running it in and out of a relational database (though with HDFS and Hive, the constructs of filesystem and query retrieval from the conventional data world are not out of reach). At Lucid, we’ve helped streamline that process, for example, with built-in HDFS connectors from LucidWorks.

But that doesn’t answer the question about how to animate the virtuous cycle of insights available once you get all that data stored. Here’s where the search equation gets interesting. If you know exactly what you are looking for every time, it’s one thing to write some jobs that extract a particular trend or insight. But when you keep everything, can you know everything a priori? Of course not. Grant Ingersoll’s talk sets forth a powerful portfolio of tools centered on Lucene/Solr

These two talks between them will give you a solid foundation for why applying search to big data matters to end users and businesses alike. Better awareness driven by the  search backed by real data, combined with enablement of developers who can better fine tune access and retrieval, and the agility to fill the white spaces of relationships between available information — what you didn’t know you didn’t know.

More talks from Barcelona are here. We’ll touch on the talk from Michael Busch of Twitter soon.


Source: http://www.lucidimagination.com/blog/2011/10/27/hadoop-lets-you-store-everything-with-lucenesolr-and-more-you-can-find-what-youre-looking-for/

hadoop Big data

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Data Ingestion vs. ETL: Definition, Benefits, and Key Differences
  • How To Use Terraform to Provision an AWS EC2 Instance
  • How to Create a Real-Time Scalable Streaming App Using Apache NiFi, Apache Pulsar, and Apache Flink SQL
  • Pair Testing in Software Development

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: