Big Data - Think It Tech

Big Data training and certification program

For Big Data workers, the Certified Big Data Foundation Specialist (CBDFS) designation is a widely recognised credential. Being a CBDFS highlights your experience working in a cloud environment and shows that you have the necessary abilities and knowledge. Businesses that use CBDFS will have professionals on staff who can help them exploit the commercial prospects that cloud computing is generating. The certification given to people who pass the exam is called the Certified Big Data Foundation Specialist (CBDFS) Certification. After completing our e-course, you’ll not only be familiar with the basics of big data, but you’ll also be introduced. This useful information can be used as a springboard for an organization’s Big Data journey.The Big Data Foundation E-Course is provided to you via our eLearning platform, giving you the flexibility to access it whenever you like, whether at home or at work. It includes an online course and an eBook study guide.

Curriculum

Understanding Big Data and Hadoop

Introduction to Big Data & Big Data Challenges
Limitations & Solutions of Big Data Architecture
Hadoop & its Features
Hadoop Ecosystem
Hadoop 2.x Core Components
Hadoop Storage: HDFS (Hadoop Distributed File System)
Hadoop Processing: MapReduce Framework
Different Hadoop Distributions

Hadoop Architecture and HDFS

Hadoop 2.x Cluster Architecture
Federation and High Availability Architecture
Typical Production Hadoop Cluster
Hadoop Cluster Modes
Common Hadoop Shell Commands
Hadoop 2.x Configuration Files
Single Node Cluster & Multi-Node Cluster set up
Basic Hadoop Administration

Hadoop MapReduce Framework

Traditional way vs MapReduce way
Why MapReduce
YARN Components
YARN Architecture
YARN MapReduce Application Execution Flow
YARN Workflow
Anatomy of MapReduce Program
Input Splits, Relation between Input Splits and HDFS Blocks
MapReduce: Combiner & Partitioner
Demo of Health Care Dataset
Demo of Weather Dataset

Advanced Hadoop MapReduce

Counters
Distributed Cache
MRunit
Reduce Join
Custom Input Format
Sequence Input Format
XML file Parsing using MapReduce.

Apache Pig

Introduction to Apache Pig
MapReduce vs Pig
Pig Components & Pig Execution
Pig Data Types & Data Models in Pig
Pig Latin Programs
Shell and Utility Commands
Pig UDF & Pig Streaming
Testing Pig scripts with Punit
Aviation use-case in PIG
Pig Demo of Healthcare Dataset

Apache Hive

Introduction to Apache Hive
Hive vs Pig
Hive Architecture and Components
Hive Metastore
Limitations of Hive
Comparison with Traditional Database
Hive Data Types and Data Models
Hive Partition
Hive Bucketing
Hive Tables (Managed Tables and External Tables)
Importing Data
Querying Data & Managing Outputs
Hive Script & Hive UDF
Retail use case in Hive
Hive Demo on Healthcare Dataset

Advanced Apache Hive and HBase

Hive QL: Joining Tables, Dynamic Partitioning
Custom MapReduce Scripts
Hive Indexes and views
Hive Query Optimizers
Hive Thrift Server
Hive UDF
HBase v/s RDBMS
HBase Components
HBase Architecture
HBase Run Modes
HBase Configuration
HBase Cluster Deployment

Advanced Apache HBase

HBase Data Model
HBase Shell
HBase Client API
Hive Data Loading Techniques
Apache Zookeeper Introduction
ZooKeeper Data Model
Zookeeper Service
HBase Bulk Loading
Getting and Inserting Data
HBase Filters

Processing Distributed Data with Apache Spark

What is Spark
Spark Ecosystem
Spark Components
What is Scala
Why Scala
SparkContext
Spark RDD

Oozie and Hadoop Project

Oozie
Oozie Components
Oozie Workflow
Scheduling Jobs with Oozie Scheduler
Demo of Oozie Workflow
Oozie Coordinator
Oozie Commands
Oozie Web Console
Oozie for MapReduce
Combining flow of MapReduce Jobs
Hive in Oozie
Hadoop Project Demo
Hadoop Talend Integration