BigData Investigation 2 – My Travel Guide: The Hadoop Book

There are plenty of online courses available which introduce Hadoop. Though as old hand I prefer a book. I browsed in my preferred online book store and ordered “Hadoop: The Definite Guide” by Tom White.

I chose this book for several reasons. First, the book provides a plenty of code examples which can be downloaded from GitHub. Second, Appendix A includes detailed instructions on how to install Hadoop on a single machine. Both are very important for me, because I want to try many examples on a live system. Third, the Hadoop Distributed Filesystem (HDFS) is covered in detail in one of the first chapters of the book. This is an additional plus, given my interest in the storage aspects of BigData.

The book is structured in five parts. Part I introduces the fundamental components: MapReduce, HDFS, YARN and Hadoop I/O. This is a nice surprise. I heard about Pig, Hive, Flume, Spark, HBase, Oozie, ZooKeeper and a lot of other stuff with fancy names. I am relieved that only four components are required to get started with Hadoop. Part II discusses MapReduce in depth and Part III completes the basics by describing installation and administration of Hadoop clusters.

Part IV and V teach advanced topics. Part IV goes into the details of the above mentioned additional components (e.g. Pig, Hive, Spark), ten chapters, each dedicated to a different component. Part V presents some interesting case studies on health care and life science. Finally, an Appendix provides supplemental information.

I already took a quick glance at Part I. The MapReduce chapter includes a plenty of examples which I want to try on a live system, so I urgently need to setup my own Hadoop cluster.

In the next post I will explain how to setup a single-node Hadoop cluster.

Changes:
2016/09/02 – added link – “how to setup a single-node Hadoop cluster” => BigData Investigation 3 – Installing the Cloudera QuickStart VM on VirtualBox

Share this article

Comments 1

Proquotient

July 13, 2017 at 7:34 am

The book you have referred to seems to be a very good choice and we will definitely recommend this to others. It is very important for beginners to start with the basics and also have code examples to work on.

BigData Investigation 2 – My Travel Guide: The Hadoop Book

Comments 1

Leave a Reply Cancel reply