Review: Hadoop Beginner's Guide
Hadoop Beginner's Guide is, as the title suggests, a new introductory book to the Hadoop ecosystem. It provides an introduction to how to get up and running with the core components of Hadoop (Map-Reduce and HDFS), some higher level tools like Hive, integration tools like Sqoop and Flume, and it also provides some good starting information relating to operational issues with Hadoop. This is not an exhaustive reference like Hadoop: The Definitive Guide, and for a beginner, that's probably a good thing. (In my day, we only had The Definitive Guide, and we liked it!)
Most of the topics are covered in a "dive right in" format. After some brief introduction to the topic the author provides a list of commands or a block of code and invites you to run it. This is followed by "What just happened?" that explains the details of the operation or code. Personally, I don't care for that too much because the explanation is sometimes separated from the code by multiple pages, which was a real hassle reading this as a PDF. But, maybe that's just me.
As I mentioned, the book includes a couple of chapters on operations, which I found to be a nice addition to a beginner's book. Some of these operational details were explained by hands-on experiments like shutting down processes or nodes, in which case "What just happened?" is more like "What just broke?" The operational scenarios are by no means exhaustive (that's what you learn from production), but they provide the reader with some "real life" experience gained in a low-risk environment. And, they introduce a powerful method to learn more operational details: set up an experiment and find out what happens. Learning to learn is the most valuable thing you can gain from any book, class, or seminar.
Another nice feature of this book that I haven't seen in others is that the author includes examples of Amazon EC2 and Elastic Map Reduce (EMR). There are examples of both Map Reduce and Hive jobs on EMR. He doesn't do everything with "raw" Map Reduce and EMR because once you know the basics of EMR, the same principles apply to both raw Hadoop and EMR.
I do have some complaints about the book, but many of them are nit-picking or personal style. That said, I think the biggest thing this book would benefit from would be some very detailed "technical editing." By that I mean there are technical details that got corrupted during the book production process. For example, the hadoop command is often rendered as Hadoop in examples. There are plenty of similar formatting and typographic errors. Of course, an experienced Hadoop user wouldn't be tripped up by these, but this is a "beginner's guide," and such details can cause tremendous pain and suffering for newbies.
To wrap things up, Hadoop Beginner's Guide is a pretty good introduction to the Hadoop ecosystem. I'd recommend it to anyone just starting out with Hadoop before moving on to something more reference-oriented like The Definitive Guide.