Big Data Hodoop and Spark Developer
(All course fees are in USD)
Course Description
The Big Data Hadoop certification training is designed to give you an in-depth knowledge of the Big Data framework using Hadoop and Spark. In this hands-on Hadoop course, you will execute real-life, industry-based projects using Integrated Lab.
Offered in Partnership with
Simplilearn
Course Delivery
- Online self-paced learning: 10 hours
- Live online virtual classroom training: 48 hours
Total online blended learning: 58 hours
Benefits
- Total blended learning of 58 hours
- 4 real-life industry projects using Hadoop, Hive and Big data stack
- Training on Yarn, MapReduce, Pig, Hive, HBase, and Apache Spark
- Aligned to Cloudera CCA175 certification exam
Skills to be Learned
- Realtime data processing
- Functional programming
- Spark applications
- Parallel processing
- Spark RDD optimization techniques
- Spark SQL
Award upon Successful Completionfrom
Big Data Hadoop and Spark Developer “Certificate of Achievement” from Simplilearn
Awarding Organisation
Simplilearn
Learning Path
Besides knowledge enhancement, this online course is also aligned with and would also prepare you for the Cloudera CCA Spark and Hadoop Developer Exam CCA (175).
Learning Outcomes
This Big Data Hadoop and Spark Developer course will enable you to:
- Learn how to navigate the Hadoop ecosystem and understand how to optimize its use
- Ingest data using Sqoop, Flume, and Kafka.
- Implement partitioning, bucketing, and indexing in Hive
- Work with RDD in Apache Spark
- Process real-time streaming data
- Perform DataFrame operations in Spark using SQL queries
- Implement User-Defined Functions (UDF) and User-Defined Attribute Functions (UDAF) in Spark
Assessments
Course-end Quizz
Industry Projects
Project 1 Analyzing Historical Insurance claims
Use Hadoop features to predict patterns and share actionable insights for a car insurance company.
Project 2 Analyzing Intraday price changes
Use Hive features for data engineering and analysis of New York stock exchange data.
Project 3 Analyzing employee sentiment
Perform sentiment analysis on employee review data gathered from Google, Netflix, and Facebook.
Project 4 Analyzing Product performance
Perform product and customer segmentation to increase the sales of Amazon.
Course Completion Criteria
- Completion of at least 85% of online self-paced learning
- Attendance of one live virtual classroom
- A score of at least 75% in course-end assessment
- Successful evaluation in at least one project
Who Should Enrol
- Analytics professionals
- Senior IT professionals
- Testing and mainframe professionals
- Data management professionals
- Business intelligence professionals
- Project managers
- Graduates looking to begin a career in big data analytics
Prerequisites
It is recommended that you have knowledge of:
- Core Java
- SQL
Course Overview
Lesson 01 – Course Introduction
Lesson 02 – Introduction to Big Data and Hadoop
Lesson 03 – Hadoop Architecture, Distributed Storage (HDFS) and YARN
Lesson 04 – Data Ingestion Big Data Systems and ETL
Lesson 05 – Distributed Processing – MapReduce Framework and Pig
Lesson 06 – Apache Hive
Lesson 07 – NoSQL Databases – HBase
Lesson 08 – Basics of Functional Programming and Scala
Lesson 09 – Apache Spark Next Generation Big Data Framework
Lesson 10 – Spark Core Processing RDD
Lesson 11 – Spark SQL – Processing DataFrames
Lesson 12 – Spark MLLib – Modelling BigData with Spark
Lesson 13 – Stream Processing Frameworks and Spark Streaming
Lesson 14 – Spark GraphX
Accessible Period of Course
1 year from date of enrolment
Customer Reviews
Solomon Larbi Opoku
Senior Desktop Support Technician
Content looks comprehensive and meets industry and market demand. The combination of theory and practical training is amazing.
Navin Ranjan
Assistant Consultant
Faculty is very good and explains all the things very clearly. Big data is totally new to me so I am not able to understand a few things but after listening to recordings I get most of the things.
Ludovick Jacob
Manager of Enterprise Database Engineering & Support at USAC
I really like the content of the course and the way trainer relates it with real-life examples.
Puviarasan Sivanantham
Data Engineer at Fanatics, Inc.
Dedication of the trainer towards answering each & every question of the trainees makes us feel great and the online session as real as a classroom session.
Richard Kershner
Software Developer
The trainer was knowledgeable and patient in explaining things. Many things were significantly easier to grasp with a live interactive instructor. I also like that he went out of his way to send additional information and solutions after the class via email.
Aaron Whigham
Business Analyst at CNA Surety
Very knowledgeable trainer, appreciate the time slot as well… Loved everything so far. I am very excited…
Rudolf Schier
Java Software Engineer at DAT Solutions
Great approach for the core understanding of Hadoop. Concepts are repeated from different points of view, responding to audience. At the end of the class you understand it.
Kinshuk Srivastava
Data Scientist at Walmart
The course is very informative and interactive and that is the best part of this training.
Priyanka Garg
Sr. Consultant
Very informative and active sessions. Trainer is easy going and very interactive.
Peter Dao
Senior Technical Analyst at Sutter Health
The content is well designed and the instructor was excellent.
Anil Prakash Singh
Project Manager/Senior Business Analyst @ Tata Consultancy Services
The trainer really went the extra mile to help me work along. Thanks
Dipto Mukherjee
Etl Lead at Syntel
Excellent learning experience. The training was superb! Thanks Simplilearn for arranging such wonderful sessions.
Shubhangi Meshram
Senior Technical Associate at Tech Mahindra
I am impressed with the overall structure of training, like if we miss class we get the recording, for practice we have CloudLabs, discussion forum for subject clarifications, and the trainer is always there to answer.
Course Features
- Students 0 student
- Max Students1000
- Duration58 hour
- Skill levelall
- LanguageEnglish
- Re-take course1000
-
Lesson 01 - Course Introduction
-
Lesson 02 - Introduction to Big Data and Hadoop
-
Lesson 03 - Hadoop Architecture, Distributed Storage (HDFS) and YARN
-
Lesson 04 - Data Ingestion Big Data Systems and ETL
-
Lesson 05 - Distributed Processing - MapReduce Framework and Pig
-
Lesson 06 - Apache Hive
-
Lesson 07 - NoSQL Databases - HBase
-
Lesson 08 - Basics of Functional Programming and Scala
-
Lesson 09 - Apache Spark Next Generation Big Data Framework
-
Lesson 10 - Spark Core Processing RDD
-
Lesson 11 - Spark SQL - Processing DataFrames
-
Lesson 12 - Spark MLLib - Modelling BigData with Spark
-
Lesson 13 - Stream Processing Frameworks and Spark Streaming
-
Lesson 14 - Spark GraphX