Hive Training Course

Public Classroom

Summary

Hive Logo

Hive is a system for querying and managing structured data built on top of Hadoop. It uses Map-Reduce for execution, HDFS for storage, and structured data with rich data types (structs, lists and maps) to represent data. Hive allows to directly query data from different formats (text/binary) and file formats (Flat/Sequence) using SQL as a familiar programming tool for standard analytics. Hive provides extensibility using embedded scripts for non standard applications, and it supports rich metadata to allow data discovery and optimization. This comprehensive one-day Hive training class gives you the skills you need to start using Hive in your project.

Duration

1 day

Course Objectives

By the completion of this Hadoop course, the participants should be able to:

  • Understand the main concepts of using Hive
  • Create Hive’s native and external tables
  • Write SQL queries and learn some tricks of optimization
  • Debug and resolve issues
  • Write plugable Map-Reduce scripts
  • Learn important settings and some administration tasks

Audience

This course is designed for Software Engineers, Administrators of Hadoop/Hive, Data Analysts

Pre-requisites

Working knowledge of SQL. Some knowledge of scripting languages. Basic understanding of Linux operating system

Outline

Why Hive vs. regular Map-Reduce?

  • History
  • Definitions and terminology

Hive’s architecture and functionality

  • Services and interoperability with Hadoop
  • Query processor

Hive’s MetaData

  • Creating new tables
  • Partitioned tables
  • Dynamic partitions
  • Tables with different serialization and encoding formats

Writing Hive’s complex queries

  • Different kinds of joins
  • Embedding custom scripts

Administration of running Hive queries

  • Hadoop permissions and groups
  • Enabling jobs scheduling/prioritizing strategies
  • Setting controls on shared resources
  • Hive’s production quality metadata storage and its backup
  • Tools for jobs control flow – overview

Advanced Hive functionality

  • Writing embedded Map/Reduce scripts
  • Considerations of Map vs.Reduce, RAM vs. writes
  • Writing embedded Java UDF and UDAF

Case studies and best practices