Hadoop is fast, but in-memory is faster. To speed up your data access all around you need to speed up your HDFS file access (helps Hive, HBase, etc...). You can up your hardware game with uber fast EMC Isilon or high-end servers from HPE, Dell or IBM. The easiest, nearly free if you have some RAM available, is to use the excellent Apache Open-Source project, Apache Ignite. I will also look at Apache Geode, Redis, SnappyData and other in-memory accelerators in future How-To articles.
To use the first follow the instructions from the project site. I installed mine on the freely available Hortonworks HDP 2.4 Sandbox. Make sure you choose the "In-Memory Hadoop Accelerator", as this is the correct product for using with Hadoop:
wget https://dist.apache.org/repos/dist/release/ignite/1.7.0/apache-ignite-hadoop-1.7.0-bin.zip unzip apache-ignite-hadoop-1.7.0-bin.zip
Create /etc/default/hadoop configuration file, make sure you have Java, Ignite, and the Hadoop environment variables setup properly for your environment.
[root@sandbox apache-ignite-hadoop-1.7.0-bin]# cat /etc/default/hadoop export JAVA_HOME=/usr/lib/jvm/java-1.7.0 export IGNITE_HOME=/opt/demo/ignite/apache-ignite-hadoop-1.7.0-bin export HDP=/usr/hdp/current export HADOOP_HOME=$HDP/hadoop-client/ export HADOOP_COMMON_HOME=$HDP/hadoop-client/ export HADOOP_HDFS_HOME=$HDP/hadoop-hdfs-client/ export HADOOP_MAPRED_HOME=$HDP/hadoop-mapreduce-client/
To Run Acceleration, you merely need to:
cd /opt/demo/ignite/apache-ignite-hadoop-1.7.0-bin bin/ignite.sh
You will also need to set some YARN and HDFS configuration from Ambari using the including instructions, but those work as described. You will then need to restart those nodes with Ambari. Then all your calls will be faster!!!