Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Feb. 14 to Feb. 20). Here they are, in order of popularity:
Every week here and in our newsletter, we feature a new developer/blogger from the DZone community to catch up and find out what he or she is working on now and what's coming next. This week we're talking to Rafał Kuć, software architect and Solr and Lucene specialist.
Recently the author read a book on Map/Reduce algorithms by Lin and Dyer. This book gives a deep insight in designing efficient M/R algorithms. Today, in this post, he will discuss the in-mapper combining algorithm and a sample M/R program using this algorithm.
ElasticSearch index files grow large quickly, and one of the most common questions about them is how to optimize them and clean them, getting rid of old records you're not interested in any longer. A very easy way to accomplish these tasks is using the following two scripts.
Apache Avro is a popular data serialization format and is gaining more users, because many Hadoop-based tools natively support Avro for serialization and deserialization. In this post we will understand some basics about Avro.
If you want to use Java objects as data source and data set in eclipse's BIRT you need to do that by using sripted data source and scripted data set. This article presents the usage of sripted data set in eclipse's BIRT.