Hadoop Administration Training Course
Public Classroom
| Class Date(s) & Location(s) | Price | |||
|---|---|---|---|---|
| Hadoop Administration - (PST) Online via GoToMeeting | $1,795.00 | |||
| Hadoop Administration - (PST) Online via GoToMeeting | $1,795.00 | |||
Early Price: $1,620.00 before (Code: EARLY_HADOOPADMIN_AUG2213) | ||||
| Hadoop Administration - (PST) Online via GoToMeeting | $1,795.00 | |||
Early Price: $1,620.00 before (Code: EARLY_HADOOPADMIN_OCT2413) | ||||
| Hadoop Administration - (PST) Online via GoToMeeting | $1,795.00 | |||
Early Price: $1,620.00 before (Code: EARLY_HADOOPADMIN_DEC1213) | ||||
| - Confirmed! |
Summary

This 2-day hands-on Hadoop for System Administrators class is designed for technical operations personnel whose job is to install and maintain production Hadoop clusters in real world. We will cover Hadoop architecture and its components, installation process, monitoring and troubleshooting of the complex Hadoop issues. The class includes practical hands-on exercises and encourages open discussions of how people are using Hadoop in enterprises dealing with large data sets.
Duration
2 days
Course Objectives
By the completion of this Hadoop class, the students should be able to:
- Understand Hadoop main components and architecture
- Be comfortable working with Hadoop Distributed File System
- Understand MapReduce abstraction and how it works
- Plan your Hadoop cluster
- Deploy and administer Hadoop cluster
- Optimize Hadoop cluster for the best performance based on specific job requirements
- Monitor a Hadoop cluster and execute routine administration procedures
- Deal with Hadoop component failures and recoveries
- Get familiar with related Hadoop projects: Hbase, Hive and Pig
- Know best practices of using Hadoop in enterprise world
Audience
This course is designed for system administrators and support engineers who will maintain and troubleshoot Hadoop clusters in production or development environments.
Pre-requisites
This course is designed for people with at least a basic level of Linux system administration experience. Prior knowledge of Hadoop is not required.
Outline
Introduction to Hadoop
- The amount of data processing in today’s life
- What Hadoop is why it is important
- Hadoop comparison with traditional systems
- Hadoop history
- Hadoop main components and architecture
Hadoop Distributed File System (HDFS)
- HDFS overview and design
- HDFS architecture
- HDFS file storage
- Component failures and recoveries
- Block placement
- Balancing the Hadoop cluster
Planning your Hadoop cluster
- Planning a Hadoop cluster and its capacity
- Hadoop software and hardware configuration
- HDFS Block replication and rack awareness
- Network topology for Hadoop cluster
Hadoop Deployment
- Different Hadoop deployment types
- Hadoop distribution options
- Hadoop competitors
- Hadoop installation procedure
- Distributed cluster architecture
- Lab: Hadoop Installation
Working with HDFS
- Ways of accessing data in HDFS
- Common HDFS operations and commands
- Different HDFS commands
- Internals of a file read in HDFS
- Data copying with ‘distcp’
- Lab: Working with HDFS
Map-Reduce Abstraction
- What MapReduce is and why it is popular
- The Big Picture of the MapReduce
- MapReduce process and terminology
- MapReduce components failures and recoveries
- Working with MapReduce
Hadoop Cluster Configuration
- Hadoop configuration overview and important configuration file
- Configuration parameters and values
- HDFS parameters
- MapReduce parameters
- Hadoop environment setup
- ‘Include’ and ‘Exclude’ configuration files
- Lab: MapReduce Performance Tuning
Hadoop Administration and Maintenance
- Namenode/Datanode directory structures and files
- Filesystem image and Edit log
- The Checkpoint Procedure
- Namenode failure and recovery procedure
- Safe Mode
- Metadata and Data backup
- Potential problems and solutions / What to look for…
- Adding and removing nodes
- Lab: MapReduce Filesystem Recovery
Hadoop Monitoring and Troubleshooting
- Best practices of monitoring a Hadoop cluster
- Using logs and stack traces for monitoring and troubleshooting
- Using open-source tools to monitor Hadoop cluster
Job Scheduling
- How to schedule Hadoop Jobs on the same cluster
- Default Hadoop FIFO Schedule
- Fair Scheduler and its configuration
Introduction to Hive, HBase and Pig
- Hive as a data warehouse infrastructure
- HBase as the “Hadoop Database”
- Using Pig as a scripting language for Hadoop
Hadoop Case studies
- How different organizations use Hadoop cluster in their infrastructure
Instructor
Vladislav Tcheprasov
Vlad is a software engineer-scientist with over 10 years of experience in software architecture and Object Oriented development, focusing on sophisticated, performance sensitive, data analysis problems. Vlad is experienced at working with large amounts of data using Big Data solutions such as Hadoop, Hive and applying ML techniques. His specialties are Data analysis/processing, applied Machine Learning, Distributed Computing, High Performance Algorithms and Predictive Modeling. At his current job which focuses on online advertising using behavior analysis, Vlad has designed and developed an analytics-reporting platform based on Hadoop/MapReduce and Hive. Vlad completed a MS in Computer Science from Michigan State University and regularly attends various Big Data conferences throughout the country.




