Hadoop is the core platform for structuring Big Data. It also solves the problem of formatting it for analytic purposes. Hadoop uses a distributed computing architecture consisting of many servers using commodity hardware. This intern makes it inexpensive to scale and support massive data stores.
What you will learn
After successfully completing this course, students should be able to :
- Using the Hadoop & HDFS platform
- Loading data into HDFS
- Introduction to MapReduce
- Writing and debugging MapReduce jobs
- Implementing common algorithms on Hadoop
- Benchmarking and optimizing performance
Services and Study Materials
- Study materials on Oracle SQL & PL SQL & Interview Questions will be provided.
- Will provide important interview questions and precise answers while discussing corresponding topics in class.
- Regular Assignments in the Classes.
- Real Time Scenarios to be discussed in Class.
- Extra Scenarios will be provided for practice.
BigData Hadoop Course Contents
- About Hadoop , Installation
- Architecture – HDFS , MapReduce, Terminology
- Sample MapReduce Job – WordCount
- Block, Split
- Combiner , Custom Combiner
- Setup & Clean ()
- Partitionerlogic ,CustomPartitioner
- Sequence file
- N line input format along with Record Reader
- Input/Output Format – Composite,Multiple
- Cluster setup (demo) ,
- Joins , Distributed Cache
- Compressions and HAR
- Custom Data types ,Custom Input Format
- Counters and Fair,Capacity,FIFO scheduler
- Installation , Architecture ,
- Datatypes (scalar , complex) ,
- Running Pig (interactive , Batch)
- Pig Operators – Local, Store,Dump,Distinct, Filter, ForEach, generate , Limit, Union ,
- join, order by, Describe
- Group by ,Avg Default UDFs available ( Built in function ) REG EX EXPLAIN
- Parallel processing
- Custom UDF, How to use your custom UDF in your script
- Installation , Hive Services , Architecture , Comparing Hive to traditional Databases
- Relational Data Analysis – (data types (primitive,complex)databases-
- Hive Schema & Data storage Loading data into Hive views Storing query results (store)
- Text processing - Built in functions , string functions , regular expressions
- Managed vs External Tables
- Optimization : Partitioning , bucketing , indexing data
- Extending Hive : Custom UDFs,CustomSerDes
- Introduction – history and evolution Installaton –standalone (distributed) – staring
- Hbase shell introduction – storing , reading data with shell
- Data Model, Physical Model and Hbase Distributed Model.
- Hbase java client Read and Write paths