Over a million developers have joined DZone.

Apache Hadoop 2.3.0 Released

DZone's Guide to

Apache Hadoop 2.3.0 Released

· Big Data Zone ·
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

This week, Apache Hadoop 2.3.0 was released. There are a lot of bug fixes and small changes in this one - you can read it all in Apache's release notes - but the folks at the Cloudera blog highlight one big change: in-memory caching for HDFS. Cloudera describes the feature as follows:

HDFS caching lets users explicitly cache certain files or directories in HDFS. DataNodes will then cache the corresponding blocks in off-heap memory through the use of mmap and mlock. Once cached, Hadoop applications can query the locations of cached blocks and place their tasks for memory-locality. Finally, when memory-local, applications can use the new zero-copy read API to read cached data with no additional overhead. Preliminary benchmarks show that optimized applications can achieve read throughput on the order of gigabytes per second.

Another big feature, according to Arun Murthy at Hortonworks, is support for heterogeneous storage hierarchy in HDFS. According to Murthy:

With support for heterogeneous storage classes in HDFS, we now can take advantage of different storage types on the same Hadoop clusters. Hence, we can now make better cost/benefit tradeoffs with different storage media such as commodity disks, enterprise-grade disks, SSDs, Memory etc.

So, be sure to take a look. Hortonworks' announcement post also includes a look ahead toward 2.4.0, in case 2.3.0 just isn't enough.

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}