Hadoop is a buzzword that’s been thrown around all over the place: As Apache’s software framework for managing vast amounts of distributed datasets, it’s become increasingly popular in a world where the petabyte has gone from a theoretical maximum to a likely one. Hadoop’s popularity, however, doesn’t stem from its intuitive nature; to help you understand and use Hadoop effectively, here’s a list of books on the subject that are informative, clear, and helpful!
Hadoop: The Definitive GuideBy Tom White
O’Reilly tends to be very reliable on the technical front, and this book from Tom White is no exception: it’s an informative, helpful, and clear guide on what Hadoop is, how it works, and how and why it should be used in contrast to other parallel processing methods. It has chapters on Hadoop’s origin, examples on how to use its APIs, and how to set up and maintain Hadoop with cloud services like Amazon’s EC2; by the end of the book, you’re going to have a very solid idea of what Hadoop is, how to use it, and clear instructions and examples on setting it up with different scenarios and platforms.
One of the great strengths of this book is the fact that it’s amazingly detailed: of all the books on this list, it’s the one that goes the furthest into the actual theory and concepts behind Hadoop and how it works. For those readers looking for just a practical guide into using Hadoop, this may be a bit overwhelming and out of scope: for those who really want the background, however, White’s book is exhaustive and will leave you with a very clear understanding of how Hadoop and the technology behind it actually function.
The only downside of the book (and one the author himself points out) is that the APIs he mentions are going to be out-of-date simply because of how quickly Hadoop is changing; that said, however, the book’s true value is not in its API delineation but in its very clear instruction on Hadoop’s nature and how, when, and why to use it with different types of distributed datasets, both local and remote. It’s a great book that deserves a spot on anyone’s bookshelf who wants to get into using and deploying Hadoop in their projects!
Pro HadoopBy Jason Venner
“Pro Hadoop” is a very good introduction to the world of Hadoop; like Tom White’s book above, it is dedicated at providing an adequate explanation and outline of what Hadoop is, how to use it, and when to use it. Venner’s book goes through much of the same process as the O’Reilly book above; it is a very good guide on getting up to speed with Hadoop, including case scenarios and other code snippets.
While not as in-depth as “Hadoop: The Definitive Guide”, it’s a bit friendlier and would be helpful to anyone who’s not quite as comfortable wading through more complex technical jargon and explanations. Venner’s aim is to get you enough theoretical knowledge of Hadoop to use and understand it, but not to make you an expert in all the nuts and bolts of its inner workings and structure; unlike White’s book, he doesn’t try to explain everything there is to know about Hadoop’s mechanics, instead opting for what you need to know.
Where this book really shines, however, is not in its technical description or explanation: it is very good at presenting practical, useful advice for deploying a Hadoop cluster. It relies less on conceptual knowledge and more on what works; it is a very hands-on, configuration-oriented guide that will get you up and running with your Hadoop cluster and help you avoid the common pitfalls and mistakes that many novice Hadoop users make. It’s a useful, practical guide that would do well to live on the shelves of anyone looking to use Hadoop in their production environment!
Hadoop in ActionBy Chuck Lam
“Hadoop in Action” is absolutely the least theory-heavy book in this list: it is entirely aimed at getting you up to speed with Hadoop in the way you need to use it in your environment. Lam’s writing style is clear, to-the-point, and utterly pragmatic: if the information doesn’t absolutely need to be there, he won’t put it, instead replacing it with handy practical information or useful code snippets to get your Hadoop install up and running as quickly as possible. There’s tons of information here for both Python and Java adherents, along with tons of configuration examples and suggestions.
Where Lam’s book is weak, obviously, is in the theory aspect of Hadoop: it’s there, albeit minimally, and even then the bare minimum needed to get you going. Those of you in it for theory should check out the books up top on this list; for programmers looking to just deploy Hadoop and write functions for it, however, “Hadoop in Action” deserves a spot on your programming shelf!
The world of Hadoop is still evolving and so is the selection of books on the subject. The three above clearly stand out from the small pack. All three attempt to simplify the complex subject by eliminating fluff and presenting the material using real life Hadoop examples. They definitely deserve a spot on OUR tech shelf. What about YOURS?
Help us spread the word!
If you liked this article, consider enrolling in one of these related courses:
|Jan 16-18||Hadoop Administration|
|- Classroom - Online|