Hive Training Course
Public Classroom
| Class Date(s) & Location(s) | Price | ||
|---|---|---|---|
| Hive - (PST) Online via GoToMeeting | $1,195.00 | ||
Early Price: $1,080.00 before (Code: EARLY_HIVE_JUN2813) | |||
| Hive - (PST) Online via GoToMeeting | $1,195.00 | ||
Early Price: $1,080.00 before (Code: EARLY_HIVE_AUG1613) | |||
| Hive - (PST) Online via GoToMeeting | $1,195.00 | ||
Early Price: $1,080.00 before (Code: EARLY_HIVE_OCT1813) | |||
| Hive - (PST) Online via GoToMeeting | $1,195.00 | ||
Early Price: $1,080.00 before (Code: EARLY_HIVE_DEC2013) | |||
Summary

Hive is a system for querying and managing structured data built on top of Hadoop. It uses Map-Reduce for execution, HDFS for storage, and structured data with rich data types (structs, lists and maps) to represent data. Hive allows to directly query data from different formats (text/binary) and file formats (Flat/Sequence) using SQL as a familiar programming tool for standard analytics. Hive provides extensibility using embedded scripts for non standard applications, and it supports rich metadata to allow data discovery and optimization. This comprehensive one-day Hive training class gives you the skills you need to start using Hive in your project.
Duration
1 day
Course Objectives
By the completion of this Hadoop course, the participants should be able to:
- Understand the main concepts of using Hive
- Create Hive’s native and external tables
- Write SQL queries and learn some tricks of optimization
- Debug and resolve issues
- Write plugable Map-Reduce scripts
- Learn important settings and some administration tasks
Audience
This course is designed for Software Engineers, Administrators of Hadoop/Hive, Data Analysts
Pre-requisites
Working knowledge of SQL. Some knowledge of scripting languages. Basic understanding of Linux operating system
Outline
Why Hive vs. regular Map-Reduce?
- History
- Definitions and terminology
Hive’s architecture and functionality
- Services and interoperability with Hadoop
- Query processor
Hive’s MetaData
- Creating new tables
- Partitioned tables
- Dynamic partitions
- Tables with different serialization and encoding formats
Writing Hive’s complex queries
- Different kinds of joins
- Embedding custom scripts
Administration of running Hive queries
- Hadoop permissions and groups
- Enabling jobs scheduling/prioritizing strategies
- Setting controls on shared resources
- Hive’s production quality metadata storage and its backup
- Tools for jobs control flow – overview
Advanced Hive functionality
- Writing embedded Map/Reduce scripts
- Considerations of Map vs.Reduce, RAM vs. writes
- Writing embedded Java UDF and UDAF
Case studies and best practices
Instructor
Vladislav Tcheprasov
Vlad is a software engineer-scientist with over 10 years of experience in software architecture and Object Oriented development, focusing on sophisticated, performance sensitive, data analysis problems. Vlad is experienced at working with large amounts of data using Big Data solutions such as Hadoop, Hive and applying ML techniques. His specialties are Data analysis/processing, applied Machine Learning, Distributed Computing, High Performance Algorithms and Predictive Modeling. At his current job which focuses on online advertising using behavior analysis, Vlad has designed and developed an analytics-reporting platform based on Hadoop/MapReduce and Hive. Vlad completed a MS in Computer Science from Michigan State University and regularly attends various Big Data conferences throughout the country.




