Hadoop: Fundamentals

Category
Programming
Level
Beginner
Lessons
41
Duration
3h 20m 04s
Last Updated: 2018-10-18
Before you begin, it is recommended you download and explore some software and utilities. The prerequisites chapter discusses all of these tools. For example, the Hadoop sandbox provides us with a working cluster. Utilities like PuTTY allow us to interact with the cluster in order to run jobs, perform file system operations, and demonstrate the capabilities of Hadoop. Linux is the operating system that supports Hadoop.

Once you have all the tools you need to get started, you will learn about the history of Hadoop; how it began as an attempt to create a better open source search engine and how it grew into the powerful data and processing engine it is today.

We’ll explore how Hadoop might fit within a large-scale enterprise, evaluating strengths and weakness of its implementation. We’ll also take a tour of the Hadoop Sandbox using the Ambari graphical user interface.

A core component of Hadoop is the Hadoop File System (HDFS). We’ll talk about how it differs from an ordinary file system and how it supports the Hadoop distributed architecture. We’ll take a look at the various nodes of HDFS and their respective roles. We’ll end with a tour of the HDFS within Linux.

We’ll then learn about ETL and MapReduce. ETL is what connects Hadoop to the outside world. Scoop is an ETL tool provided by Hadoop for exchanging data between Hadoop and an external database server. We’ll go over how to use Scoop to pull data from a Postgres database. We’ll demonstrate how to build and run a basic application in the Java language and follow it up with information on a very important component of Hadoop: MapReduce.
Category
Programming
Level
Beginner
Lessons
41
Duration
3h 20m 04s
Hadoop: Fundamentals - Chapter 01 - Hadoop Architecture
Topic A: Prerequisites - Part 1
3 lessons20m 29sCompleted 0 / 3
Topic B: Introduction - Part 1
3 lessons19m 15sCompleted 0 / 3
Topic C: History - Part 1
3 lessons8m 25sCompleted 0 / 3
Topic D: Architecture - Part 1
3 lessons7m 01sCompleted 0 / 3
Topic E: Ecosystems - Part 1
3 lessons16m 13sCompleted 0 / 3
Hadoop: Fundamentals - Chapter 02 - ETL and MapReduce
Kevin McCarty
I’m a computer professional with over 30 years of experience in the industry as a programmer, project manager, database administrator, architect, and data scientist. I’m a Microsoft Certified Trainer with more than 25 individual certifications in programming and database technologies. I serve as the chapter leader of the Boise SQL Server Users Group. As a former army officer and Eagle Scout, I hold a doctorate in computer science and have a lifelong love of learning.
Kevin McCartyInstructor and Curriculum Developer
Other Courses by Kevin McCarty