YARN, aka MapReduce 2, is the recent advancement to MapReduce framework. Neither do I know why it is called YARN nor its acronym. But, the proposed paradigm looks promising especially in terms of ‘resource management’. YARN is not a new framework, but is entirely based on the existing MapReduce framework with a little improvement in terms of resource management.

As I read through the blogs related to YARN, this is what I understand about it. I see that JobTracker is split into two different deamons, ‘Resource Monitor’ and ‘Application Master’. Resource Monitor deamon is responsible for scheduling the tasks on TaskTrackers. It does not bear the overhead to monitor the resource consumption on each of the TaskTrackers, but just acts as a orchestrator allocating resources. On a higher level, what it does is JUST scheduling and nothing more. Each TaskTracker is assisted with a ‘Node Manager’ which bears the responsibility of tough job of tracking resources used on it. Node Manager keeps updating Resource Manager about the resource usage on each node. Each task is apparently executed in a logical structure called container. Application Master on each node is responsible for requesting Resource Manager for a container with a particular resource specification, which is fulfilled through Node Manager.

Architecture of YARN

We have to wait and see how YARN performs in terms of performance when compared to MapReduce framework.

This new approach of splitting the work of JobTracker across two daemons greatly benifits the research community. Previously, MapReduce was not supported with a daemon which could explicitly track down the resource usage of a node. Now, Node Manager could come in handy in designing better scheduling algorithms given that we have abundant information on resource usage patterns.


About these ads