In this post, we will learn about YARN architecture. YARN (Yet Another Resource Negotiator) is the key component of Hadoop 2.x.
The underlying file system continues to be HDFS. It is basically a framework to develop and/or execute distributed processing applications. For Example MapReduce, Spark, Apache Giraph etc.
Let us look at one of the scenarios to understand the YARN architecture better.
Suppose we have two client requests. One wants to run the MapReduce job while another wants to execute a shell script.
MapReduce job is represented in blue color while as Shell script one is represented in green color.
Resource manager has two main components, application manager, and scheduler. The scheduler is responsible for allocating resources to the various running applications. The scheduler is pure scheduler in the sense that it performs no monitoring or tracking of the status of the application.
The scheduler also offers no guarantee for restarting of failed tasks due to hardware or application failures. The scheduler performs its scheduling function based on resource requirement of the application. It does so based on the apps extract notion of the resource container which incorporates elements such as memory, CPU, disk, network etc.
Application Manager is responsible for accepting job submissions, negotiating the first container for executing the application specific application master and provides the services for restarting the application master container on failure.
Node Manager is per-machine framework agent responsible for containers, monitoring their resources such as CPU, memory, network etc. and reporting the same to resource manager/scheduler.
Application Master has a responsibility for negotiating the appropriate resource container from the scheduler, tracking their statuses and monitoring their progresses.
The green color job in the diagram will have its own application master and the blue color job will have its own application master. An application master will handle containers.
Another view of YARN architecture is where resource manager is handling job queue, resource allocation, and job scheduling.
It is allocating resources against the available resource list. Slave node is having app master handling task queue and job lifecycle logic.