Hadoop is primarily a distributed file system (DFS) where you can store terabytes/petabytes of data on low end commodity computers. This is similar to how companies like Yahoo and Google store their page feeds. Hive, HBase is used for query processing. Spring Batch is used to tie in all these. This article talks about some of these.