Summary

This 2-day hands-on Hadoop training course teaches how to set up and build reliable and scalable application systems using Hadoop open-source software. These applications are specifically geared towards processing of large datasets. Practical case studies will be demonstrated in class to show how Hadoop is used in real world today to solve different problems. MapReduce training is an essential component of this course.
Duration
2 days
Audience
This course is an introduction to Hadoop administration and development and is designed for people who are planning to use Hadoop to tackle the different real world problems dealing with large data sets.
Pre-requisites
- Basic system administration skills
- Java programming skills
Outline
Introduction to Hadoop
- Hadoop History
- Data Everywhere – How to deal with it?
- Reasons for Using Hadoop
MapReduce
- Strategy and Advantages of Using MapReduce
- Map Function
- Reduce Function
- Job Scheduling
- Input/Output Formats
- MapReduce Features
Hadoop Distributed Filesystem
- HDFS Design and Main Concepts
- Node Types
- HDFS Operations
- CLI and Java Interface
- Managing Apache Processes
Hadoop Cluster Administration
- Specification
- Installation
- Configuration
- Logging Directives
- Performance Benchmarking
- Monitoring
Developing a MapReduce Application
- Case Studies
- Configuration and Development Environment Setup
- Implementing and Running MapReduce Application locally on Test Dataset
HBase
- HBase Concepts
- HBase and RDBMS
Hive
- Hive History and Reason to Exist
- Data Store
- Web Interface


