Hadoop Training with MapReduce

Summary

Hadoop Elephant Logo

This 2-day hands-on Hadoop training course teaches how to set up and build reliable and scalable application systems using Hadoop open-source software. These applications are specifically geared towards processing of large datasets. Practical case studies will be demonstrated in class to show how Hadoop is used in real world today to solve different problems. MapReduce training is an essential component of this course.

Duration

2 days

Audience

This course is an introduction to Hadoop administration and development and is designed for people who are planning to use Hadoop to tackle the different real world problems dealing with large data sets.

Pre-requisites

  • Basic system administration skills
  • Java programming skills

Outline

Introduction to Hadoop

  • Hadoop History
  • Data Everywhere – How to deal with it?
  • Reasons for Using Hadoop

MapReduce

  • Strategy and Advantages of Using MapReduce
  • Map Function
  • Reduce Function
  • Job Scheduling
  • Input/Output Formats
  • MapReduce Features

Hadoop Distributed Filesystem

  • HDFS Design and Main Concepts
  • Node Types
  • HDFS Operations
  • CLI and Java Interface
  • Managing Apache Processes

Hadoop Cluster Administration

  • Specification
  • Installation
  • Configuration
  • Logging Directives
  • Performance Benchmarking
  • Monitoring

Developing a MapReduce Application

  • Case Studies
  • Configuration and Development Environment Setup
  • Implementing and Running MapReduce Application locally on Test Dataset

HBase

  • HBase Concepts
  • HBase and RDBMS

Hive

  • Hive History and Reason to Exist
  • Data Store
  • Web Interface