DEVELOPMENT OF A MODEL OF AN IMPROVED SCALABLE RESOURCE MANAGEMENT SYSTEM FOR HADOOP- YET ANOTHER RESOURCE NEGOTIATOR (YARN)

SOURCE:

Faculty: Physical Sciences
Department: Computer Science

CONTRIBUTORS:

Moses, Timothy
Inyiama, H. C.

ABSTRACT:

ABSTRACT
The growing popularity of cloud computing and advances in ICT today, continuous increase in the volume of data and its computational capacity has generated an overwhelming flow of data now referred to as big data. This computational capacity has exceeded the capabilities of conventional processing tools like Relational Database Management System. Big data has brought in an era of data exploration and utilization with MapReduce computational paradigm as its major enabler. Hadoop MapReduce as a data driven programming model has gained popularity in large-scale data processing due to its simplicity, scalability, fault tolerance, ease of programming and flexibility. Though great efforts through the implementation of this framework has made Hadoop scale to tens of thousands of commodity cluster processors, the centralized architecture of resource manager has adversely affected Hadoop’s scalability to tomorrow’s extreme-scale datacenters. With over 4000 nodes in a cluster, resource requests from all the nodes to a single Resource Manager leads to scalability bottleneck. Decentralizing the responsibilities of resource manager to address scalability issues of Hadoop for better response, processing, turnaround time and to eliminate single point of failure in this framework is therefore, the concern of this work. Existing framework was analyzed and its deficiencies served as a basis for improvement in the new system. The new framework decoupled the responsibilities of resource manager by providing another layer where each daemon called Rack_Unit Resource Manager (RU_RM) carry out the responsibility of allocating resources to compute nodes within its local rack. This ensured that there was no single point of failure and allowed low latency for large files on compute nodes within the same local rack. The new framework also ensured that 2/3 of each data block is stored on different compute nodes within the same local rack. This speed-up execution time for each job since bandwidth within compute nodes on same rack is higher than bandwidth across racks. Also, the introduction of novel relaxed-ring architecture in the rack unit resource manager layer of the framework provided fault tolerance mechanism for the layer. Object Modeling Technique (OMT) methodology was used for this work. The methodology captured three major viewpoints, which are static, dynamic and functional behaviour of the system. The application was developed and tested in Java programming language with Hadoop benchmark workload called WordCount used for analysis. Two performance evaluation metrics (efficiency and average task-delay ratio) were used for the purpose of comparison. Efficiency of the new model for file sizes 30.5kB and 92kB gave a difference of 3.5% and 4.7%, respectively better than the existing framework. The new model had lower average task-delay ratio of 0.056ns and 0.166ns for file sizes 30.5kB and 92kB, respectively when compared to the existing framework. Results from the analysis of this new model showed a better response time and less scheduling overheads.