In continuing my analysis of the London Neo4j meetup group using R I wanted to see which days of the week we organise meetups and how many people RSVP affirmatively by the day.
I’ve been doing some ad-hoc analysis of the Neo4j London meetup group using R and Neo4j and having worked out how to group by certain keys the next step was to order the rows of the data frame.
Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (June 27 to July 4). This week's topic's include lying with data, hadoop shell commands to manage HDFS, payloads in solr, the origins of the term big data and interactive data visualization.
Recently we had M.C.Srivas, CTO and Co-Founder of MapR Technologies, as a speaker at our Munich Hadoop User Group. He gave a nice talk about the Apache Drill Project which develops a tool providing fast interactive SQL on Hadoop and other data sources.
In Hadoop 1.x, there are some problem, for example, HA and too many small files.
In Hadoop 2.x Yarn, there are no HA problem. But only one of the masters in yarn can be active and serve to clients, and other masters are stand by. I think this is a waste of master.
Scalability has become one of those core concept slash buzzwords of Big Data. It’s all about scaling out, web scale, and so on. In principle, the idea is to be able to take one piece of code and then throw any number of computers at it to make it fast.
Lucene & Solr 4.9 were released a couple weeks ago and introduced a new result document transformer called ChildDocTransformerFactory.
In an ever-changing business world, people nowadays often find themselves get stuck in various computing problems involving complex business rules.
There are plenty of books telling data scientists (whatever they are) and others how to visualise data, how to tell stories, and how to persuade.
I’ve been a bit frustrated whenever I discuss payloads in Solr by the lack of an example I could find that gave me all the pieces in a single place. So I decided to create one for Solr 4.0+ (actually, 4.8.1 at the time of this writing, but this should apply for all the 4x code line).
It may have come about innocuously enough, in an article written in 1989 about organizations mining data for marketing purposes.
Recently, "interactive report" is becoming a hot topic in data visualization. I believe it is becoming the next generation UI paradigm for KPI reports.
So the biggest revolution in database and analytics technology – namely the distributed batch processing technique known as MapReduce (and the associated Hadoop-centric ecosystem that has built up around it) is a legacy technology for one Silicon Valley player.
Pig is a high level scripting language that is used with Apache Hadoop. Pig enables data analysts to write complex data transformations without knowing Java.
Top 10 basic Hadoop HDFS operations managed through shell commands which are useful to manage files on HDFS clusters.
Teradata announced Teradata Aster R this week, which seeks to lift memory and processing limitations in open source R analytics.
Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (June 13 to June 20). This week's topic's include Python, search relevancy, patent data and the nature of invention, scalable machine learning and anomaly detection.
This week Apache Lucene and Solr PMC announced another version of Apache Lucene library and Apache Solr search server numbered 4.9. This is a next release continuing the 4th version of both Apache Lucene and Apache Solr.
Pack Publishing has asked me to review their new book, Clojure for Machine Learning (4/2014) by Akhil Wali. Interested both in Clojure and M.L., I have taken the challenge and want to share my impressions from the first chapters.
At its developer conference on Wednesday, however, Google followed a burgeoning trend in dumping MapReduce in favor of what they're calling Google Cloud Dataflow.
In a white paper made publicly available by Harvard University, researchers broach the topic of Google Flu Trends – commonly hailed as an innovative and thorough application of Big Data – and some of its shortcomings.
A series of sensors will be installed on light poles in a downtown Chicago district later this summer. According to the Chicago Tribune, the data-collection sensors will measure air quality, light intensity, sound volume, heat, precipitation, and wind, in addition to wireless devices of passing crowds.
I read something about a “Ladder of Powers Rule” also called “Tukey and Mosteller’s Bulging Rule“. To be honest, I never heard about this rule before. But that won’t be the first time I learn something while working on my notes for a course !
Programmers usually resort to the database to implement the massive data computation of Java. However, the database is unavailable or inconvenient for some application scenarios. In which cases, the native capability of Java is the key to achieve the goal.
In this edition of "Somewhere Else" from Freakonometrics: how to be bayesian in python, how the New York Times is using machine learning to analyze its readership, an experiment on paid search effectiveness, and more.