Over a million developers have joined DZone.

The Best of the Week (Feb. 14): Big Data Zone

DZone's Guide to

The Best of the Week (Feb. 14): Big Data Zone

· Java Zone ·
Free Resource

Build vs Buy a Data Quality Solution: Which is Best for You? Gain insights on a hybrid approach. Download white paper now!

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Feb. 14 to Feb. 20). Here they are, in order of popularity:

1. Dev of the Week: Rafał Kuć

Every week here and in our newsletter, we feature a new developer/blogger from the DZone community to catch up and find out what he or she is working on now and what's coming next. This week we're talking to Rafał Kuć, software architect and Solr and Lucene specialist.

2. Designing Map/Reduce Algorithms: In-Mapper Combiner

Recently the author read a book on Map/Reduce algorithms by Lin and Dyer. This book gives a deep insight in designing efficient M/R algorithms. Today, in this post, he will discuss the in-mapper combining algorithm and a sample M/R program using this algorithm.

3. Clean and Optimize the ElasticSearch Indexes of Logstash

ElasticSearch index files grow large quickly, and one of the most common questions about them is how to optimize them and clean them, getting rid of old records you're not interested in any longer. A very easy way to accomplish these tasks is using the following two scripts.

4. Introduction to Apache Avro

Apache Avro is a popular data serialization format and is gaining more users, because many Hadoop-based tools natively support Avro for serialization and deserialization. In this post we will understand some basics about Avro.

5. Eclipse's BIRT: Scripted Data Set

If you want to use Java objects as data source and data set in eclipse's BIRT you need to do that by using sripted data source and scripted data set. This article presents the usage of sripted data set in eclipse's BIRT.

Build vs Buy a Data Quality Solution: Which is Best for You? Maintaining high quality data is essential for operational efficiency, meaningful analytics and good long-term customer relationships. But, when dealing with multiple sources of data, data quality becomes complex, so you need to know when you should build a custom data quality tools effort over canned solutions. Download our whitepaper for more insights into a hybrid approach.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}