Big Data Overview Training Course

Public Classroom

Summary

Big Data

Big Data Overview training course is designed for technical leads, architects and decision makers who need to have a thorough understanding of Big Data solutions available on the market today to solve enterprise challenges in dealing with ever increasing large data sets. We will cover main building blocks and concepts of the Big Data, key business drivers and success behind using it, as well as its challenges and limitations. We will discuss different Big Data solutions available today, focusing on Hadoop projects and Map Reduce technology.

Duration

1 day

Course Objectives

By the completion of this Big Data Overview course the participants should be able to:

  • Understand the limitations of the traditional data manipulation systems and reasons for Big Data solutions
  • Understand what Big Data is and its key building blocks
  • Identify main business reasons behind using Big Data solutions
  • Understand common challenges with Big Data processing
  • Learn about different Big Data solutions and key players on the market
  • Understand Hadoop main components and architecture
  • Understand the purpose of Hadoop Distributed File System (HDFS) and MapReduce
  • Become familiar with various Hadoop sub-projects and extensions
  • Get up to speed with future developments in Big Data technologies

Audience

This training is best suited for engineers, technical leads and managers, to help them understand the concepts of Big Data technologies and how they are used today to solve complex data problems.

Pre-requisites

Basic understanding of client/server architecture and different data manipulation technologies is preferred.

Outline

Current data trends

  • Current data volume and velocity
  • Importance of data in running a successful business
  • The value of data and the cost-value dilemma

Overview and limitations of traditional data storage and analysis systems

  • Limitations of old solutions that today’s Big Data Analytics overcome
  • Map Reduce versus RDBMS
  • High Performance Computing (HPS) and Grid Computing challenges

Big Data definition

  • What is Big Data and main concepts
  • Key building blocks in Big Data solutions
  • What problem it solves
  • Why Big Data Analytics
  • Performance metrics

Business drivers behind the use of Big Data solutions

  • Main business drivers for Big Data solutions
  • Use cases for Big Data analytics in various verticals (search, e-commerce, social networks, etc…)
  • Companies that use Big Data solutions today
  • Financial success derived from using Big Data solutions

Common problems with Big Data processing

  • Task distribution
  • Fault tolerance
  • Disk read/writes for data intensive operations
  • Potential problems and solutions when running Hadoop

Available Big Data solutions

  • Distributed databases
    • HBase
    • Cassandra
    • MongoDB
  • Fully-distributed storage and calculation frameworks
    • Hadoop
    • MarkLogic
  • Hybrid frameworks
    • Hadapt
  • Key players in today’s Big Data market

Hadoop Overview and its role in Big Data solutions

  • History
  • Current usage and comparison with traditional frameworks
  • Available distributions
    • Apache
    • Cloudera
    • EMC
    • IBM
  • Hadoop Extensions
    • Hive
    • HBase
    • Pig
    • Datameer
    • Big Sheets

Hadoop main components and architecture

  • HDFS overview and design
  • HDFS architecture
  • What MapReduce is and why it is popular
  • The Big Picture of the MapReduce

Case Studies

  • Why and how Hadoop is used at different enterprises
  • Best practices

Future developments

  • Hybrid Solutions (e.g. for reporting)
  • Extract, transform, load (ETL)