Big Data/Analytics Zone is brought to you in partnership with:
  • submit to reddit
Ana-maria Mihalceanu10/24/14
0 replies

Understanding Information Retrieval by Using Apache Lucene and Tika - Part 3

This is a sequal of what was presented in part 1 and part 2 of this tutorial; after indexing and querying we can highlight the results of a search by making use of Highlighter(s).

Ana-maria Mihalceanu10/23/14
0 replies

Understanding Information Retrieval by Using Apache Lucene and Tika - Part 2

A sequal of what was implemented in Part 1 of this tutorial; we continue indexing and improving search conditions through different features provided by the Apache Lucene library.

Ana-maria Mihalceanu10/22/14
0 replies

Understanding Information Retrieval by Using Apache Lucene and Tika - Part 1

This tutorial will explain the Lucene and Tika frameworks will be explained through their core concepts (parsing, mime detection, indexing, scoring, boosting) via illustrative examples that should be applicable to not only seasoned software developers but to beginners to content analysis and programming as well.

Linda Gimmeson10/17/14
0 replies

FAQ of Executives Regarding Apache Hadoop

Apache Hadoop has slowly been infiltrating the mainstream business world, but many executives are still left with doubts about whether adopting Hadoop is a sound strategy for their organization. Is Hadoop enterprise friendly? Is it economical for an organization to use?

Tomasz Sobczak10/16/14
1 replies

Review of "Scaling Apache Solr" Book

Review of "Scaling Apache Solr" book.

Alec Noller10/15/14
1 replies

Dev of the Week: Ashwini Kuntamukkala

Every week here and in our newsletter, we feature a new developer/blogger from the DZone community to catch up and find out what he or she is working on now and what's coming next. This week we're talking to Ashwini Kuntamukkala, Software Architect at SciSpike, Inc.

Adam Diaz10/15/14
0 replies

Hadoop and the mystery of the version number

When I’m working with people on Hadoop I ask what you would think is a simple question. What version of Hadoop are you using? In reality though it’s not as straight forward as you might think.

Mikio Braun10/14/14
0 replies

Parts But No Car

One question which pops up again and again when I talk about streamdrill is whether that cannot be done by X, where X is one of Hadoop, Spark, Go, or some other piece of Big Data infrastructure. The truth is that there’s a huge gap between “in principle” and “in reality”, and I’d like to spell this difference out in this post.

David Mai10/11/14
0 replies

22 Big Data & BI Events (U.S.) that You Must Attend Before the End of 2014

With so many events taking place it can be a very daunting task finding the one that perfectly fits your interests and needs. That being said, I’ve done some research and compiled a comprehensive list of 22 Big Data and Business Intelligence events that you must attend during Q4 of 2014.

Kevin Daly10/11/14
0 replies

Hadoop 2.0 as Part of a Data Platform: It’s Not Just About Mapreduce!

Examining exactly what is a data platform? Get a better understanding of big data and it's application. In this article I’ll be talking about the HortonWorks Data Platform as a reference platform.

Borislav Iordanov10/10/14
0 replies

Jayson Skima - Validating JavaScript Object Notation Data

A crash course on JSON Schema. A nearly complete coverage of the Draft 4 specification, in brief.

Mark Needham10/10/14
1 replies

R: A first attempt at linear regression

I’ve been working through the videos that accompany the Introduction to Statistical Learning with Applications in R book and thought it’d be interesting to try out the linear regression algorithm against my meetup data set.

David Mai10/10/14
0 replies

9 Influential Women Writers in Big Data and Business Intelligence

In my own experience as an editor who covers BI, I read numerous BI articles and I have found that despite the disproportionately low number of women in technology, many of the articles that I’ve read were authored by women. In BI, the works of women have provided great insight and thought leadership to the BI community and I personally want to list nine of the the top women writers who have helped shape my view on BI.

Mark Needham10/09/14
0 replies

R: Deriving a new data frame column based on containing string

I’ve been playing around with R data frames a bit more and one thing I wanted to do was derive a new column based on the text contained in the existing column.

Arthur Charpentier10/09/14
0 replies

How to Import Some Parts of a Large Database

In the introduction of Computational Actuarial Science with R, there was a short paragraph on how could we import only some parts of a large database, by selecting specific variables.

Mark Needham10/08/14
0 replies

R: Filtering data frames by column type ('x' must be numeric)

I’ve been working through the exercises from An Introduction to Statistical Learning and one of them required you to create a pair wise correlation matrix of variables in a data frame.

John Cook10/08/14
0 replies

The great reformulation of algebraic geometry

At the Heidelberg Laureate Forum I had a chance to interview John Tate. In his remarks below, Tate briefly comments on his early work on number theory and cohomology. Most of the post consists of his comments on the work of Alexander Grothendieck.

Veeresham Kardas10/06/14
0 replies

CSV Operations using OpenCSV

OpenCSV is one of the best tools for CSV operations. We will see how to use OpenCSV for basic reading and writing operations.

Adam Diaz10/02/14
0 replies

The Evolution of MapReduce and Hadoop

Recently I authored a section of the DZone Guide for Big Data 2014. I wrote about MapReduce and the evolution of Hadoop.

Sander Mak10/01/14
0 replies

The Developer’s Guide to Data Science

When developers talk about using data, they are usually concerned with ACID, scalability, and other operational aspects of managing data. But data science is not just about making fancy business intelligence reports for management. Data drives the user experience directly, not after the fact.

Isaac Sacolick10/01/14
0 replies

Solving the Data Scientist Shortfall by Deploying a Self Service BI Program

Want to learn more about what "self-service" BI programs? Why many organizations are looking to leverage these technologies and programs on their quest to become more data-driven.

Mark Needham09/29/14
0 replies

R: ggplot - Plotting multiple variables on a line chart

In my continued playing around with meetup data I wanted to plot the number of members who join the Neo4j group over time. I wanted to plot the actual count alongside a rolling average for which I created the following data frame:

Linda Gimmeson09/28/14
0 replies

10 Big Data Tools

Hadoop isn't the only big data tool out there. Check out this list of big data tools available.

Armel Nene09/27/14
0 replies

Big Data Architecture Best Practices

The marketing department of software vendors have done a good job making Big Data go mainstream, whatever that means. The promise of we can achieve anything if we make use of Big Data; business insight and beating our competitions to submission. Yet, there is no well-publicised Big Data successful implementation. The question is: why not?

Adam Diaz09/26/14
0 replies

The Evolution of MapReduce and Hadoop

With MapReduce, companies no longer need to delete old logs that are ripe with insights—or dump them onto unmanageable tape storage—before they’ve had a chance to analyze them. Today, the Apache Hadoop project is the most widely used implementation of MapReduce.