New Batch Available Weekday & Weekend Enroll yourself Today

Hadoop course

Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

Pay Fees After Satisfaction With Interview Guidance Download

What is Hadoop?
The Hadoop Distributed File System
How Hadoop Map Reduce Works
Anatomy of a Hadoop Cluster
Setting up Hadoop Cluster
Make a fully distributed Hadoop cluster
Cluster Specification
Network Topology
Cluster Specification and installation
Hadoop Daemons
Master Daemons
Name node
Job Tracker
Secondary name node
Slave Daemons
Data Node
Task tracker
Examining a Sample MapReduce Program With several examples
Basic API Concepts
The Driver Code
The Mapper
The Reducer
The configure and close Methods
Sequence Files
Record Reader
Record Writer
Role of Reporter
Output Collector
Processing XML files
Counters Directly Accessing HDFS
ToolRunner
Using The Distributed Cache
Common Map Reduce Alogorithms
Sorting, Searching and Indexing
Word Co-Occurrence Word Co-Occurrence
Identity Mapper
Identity Reducer
Exploring well known problems using MapReduce applications
HDFS(Hadoop Distributed File System)
Blocks and Splits
Input Splits
HDFS Splits
Methods of accessing HDFS
JAVA Approach
CLI Approach
Cluster architecture and block placement
Data Replication
Hadoop Rack Awareness
High data availability
Data Integrity
Programming Practices
Developing MapReduce Programs in
Local Mode Running without HDFS and Mapreduce
Pseudo-distributed Mode
Running all daemons in a single node
Fully distributed mode
Running daemons on dedicated nodesApps
Testing with MRUnit
Logging
Other Debugging Strategies
Advanced Map Reduce Program
A Recap of the MapReduce Flow
The Secondary Sort
Customized Input Formats and Output Formats
Introduction to YARN
What is YARN?
Why YARN?
Advantages of YARN
YARN Daemons
Resource Manager
Node Manager
Application Master
Classic Mapreduce vs YARN
Anatomy of a YARN application run
Scheduling in YARN
Fair Scheduler
Capacity Scheduler
YARN as a platform for multiple applications
Supported YARNapplications
Overview of Spark
What is Spark?
Hadoop & Spark
Features of Spark
Spark Ecosystems
Spark Streaming
Spark SQL
Spark MLib
Spark Architecture
Resilient Distributed Datasets
How to Install Spark
How to Run Spark
How to Interact with
Spark Spark Web Console
Shared Variables
Spark Applications
Word Count Application
HIVE
Hive concepts
Hive architecture
Create database, access it from java client
Buckets
Partition
Joins in hive
Inner joins
Outer Joins
Hive UDF
Introducing Cloudera Impala
Impala Benefits
How Cloudera Impala Works with CDH
Primary Impala Features
Impala Concepts and Architecture
Components of the Impala Server
The Impala Daemon
The Impala Statestore
The Impala Catalog Service
Overview of the Impala SQL Dialect
How Impala Fits Into the Hadoop Ecosystem
How Impala Works with Hive
Overview of Impala Metadata and the Metastore
How Impala Uses HDFS
FLUME
Flume concepts
Create a sample application to capture logs from Apache using flume
QOOP
Getting Sqoop
A Sample Import
Database Imports
Controlling the import
Imports and consistency
Direct-mode imports
Performing an Export
Overview of services in Android
Implementing a Service
Service lifecycle
Bound versus unbound services
PIG
Pig basics
PIG Vs MapReduce and SQL
Pig Vs Hive
Pig Vs Hive
Write sample Pig Latin scripts
Modes of running PIG
Running in Grunt shell
PIG UDFs
Pig Macros
Name Node High – Availability
Name Node federation
Fencing
Interview Preparation
Personal Interview
Group Discussion