Hadoop Overview for Managers Training Course

Public Classroom

Summary

Hadoop Overview for Managers

Hadoop Overview For Managers training course is designed for technical personnel and management who are evaluating and considering using Hadoop to solve their data scalability problems. We will cover Hadoop basics and discuss best practices using Hadoop in enterprises dealing with large data sets. We will look into the current data problems you are dealing with and potential use cases of using Hadoop in your infrastructure. Class covers Hadoop architecture and its main components: Hadoop Distributed File System (HDFS) and MapReduce. We will present case studies on how other enterprises are using Hadoop and look into what it takes to get Hadoop up and running in your environment.

Duration

1 day

Course Objectives

By the completion of this Hadoop course, the participants should be able to:

  • Understand Hadoop main components and Architecture
  • Understand Hadoop Distributed File System (HDFS)
  • Understand MapReduce abstraction and how it works
  • Understand how to plan your Hadoop cluster
  • Understand what it takes to deploy and administer Hadoop cluster
  • Understand the benefits of using Hadoop and its impact on end-users
  • Know best practices of using Hadoop in enterprise world
  • Make a decision on whether Hadoop is a suitable solution to your data problems and whether it will help you scale

Audience

This course is designed for non-Hadoop engineers and managers to help them understand what Hadoop is and how it is used today to solve complex data problems.

Pre-requisites

There are no prerequisites for this training. Prior knowledge of Hadoop is not required.

Outline

Introduction to Hadoop

  • The amount of data processing in today’s life
  • What Hadoop is why it is important
  • Hadoop comparison with traditional systems
  • Hadoop history
  • Hadoop main components and architecture

Hadoop Distributed File System (HDFS)

  • HDFS overview and design
  • HDFS architecture
  • HDFS file storage
  • Component failures and recoveries
  • Block placement
  • Balancing the Hadoop cluster

Planning your Hadoop cluster

  • Planning a Hadoop cluster and its capacity
  • Hadoop software and hardware configuration
  • HDFS Block replication and rack awareness
  • Network topology for Hadoop cluster

Hadoop Deployment

  • Different Hadoop deployment types
  • Hadoop distribution options
  • Hadoop competitors

Map-Reduce Abstraction

  • What MapReduce is and why it is popular
  • The Big Picture of the MapReduce
  • MapReduce process and terminology
  • Working with MapReduce

What it takes to run a Hadoop cluster

  • Potential problems and solutions when running Hadoop / What to look for…
  • Adding and removing nodes
  • MapReduce components failures and recoveries
  • Scheduling Hadoop jobs
  • Best practices of monitoring a Hadoop cluster

Introduction to Hive, HBase and Pig

  • Hive as a data warehouse infrastructure
  • HBase as the “Hadoop Database”
  • Using Pig as a scripting language for Hadoop

Introduction to Datameer

  • Datameer at a glance
  • Datameer capabilities
  • Data import and analytics with Datameer

Hadoop Case studies

  • How different organizations use Hadoop cluster in their infrastructure

How can Hadoop help you?

  • Current data problems you are dealing with today?
  • Potential use cases for using Hadoop in your organization
  • Is Hadoop the right choice to help you scale?