Grid Engine an Early Supporter of Hadoop Apps
Join the DZone community and get the full member experience.Join For Free
Grid Engine is used for distributed resource management. Hadoop is a Java framework for distributed applications. It contains the Hadoop Distributed File System, a distributed and fault-tolerant file system, and MapReduce, which is an application parallelization and execution environment. The company Cloudera, who also puts out a Hadoop distribution, might disagree with Templeton's claim about SGE being the first workload manager to support Hadoop apps.
The ability to submit Hadoop jobs to the Grid Engine grid is a pretty neat trick. SGE is aware of the Hadoop Distributed File System and recognizes Hadoop jobtrackers and tasktrackers. Grid Engine is able to route Hadoop jobs to the nodes where the job data already exists. This is a whole lot better than having to set up a dedicated Hadoop cluster where you have to move the data over to those nodes.
Grid Engine Diagram
SGE 6.2u5 contains a number of other new features. If grid applications require certain features such as multiple cores, high clock speeds, large caches, or high memory to run well, the job scheduler can allocate jobs to specific types of processors and server configurations. For example, Grid Engine will run cache-heavy applications in a job that is allocated on four cores across four server sockets instead of four cores sharing a single socket. Administrators can specify what hardware resources they need with the new core binding feature.
"Slotwise preemption" is another useful new feature that adds more sophisticated resource allocating rules. Instead of simple rules like 'job queue A is subordinate to B', you can now limit how many jobs are running across specific queues or indicate which queues are more important when there is a resource conflict. The SGE 6.2u5 setup is also easier to integrate with Amazon's EC2 and power down unused server nodes in a grid.
Your free download of Grid Engine is available here.