Big data has been a huge part of infrastructure in the past couple of years, but it’s new enough that not many people are fully versed in its intricacies. To help out in that regard, here are some of our favorite big data books that have come out recently that can help you be your office’s Hadoop Hero (or other alliterative pun!):
1. Hadoop in PracticeBy Alex Holmes
Hadoop in Practice makes my list for Big Data because it’s not necessarily just a Hadoop manual that explains the ins and outs of Hadoop – it’s more of a guide for someone out in the trenches. The book is less a technical reference manual than it is a list of techniques, problems, and solutions- something that might benefit someone who’s at their wit’s end trying to deal with LZO compression or serialization.
The book’s only weakness- if it can be called that- is that you do in fact get what it says in the tin. This isn’t for a beginner in any of the subjects covered- you’re expected to know some programming, and a good deal of Hadoop, including how to get it working. This isn’t a learning book in the traditional sense- it’s a volume of how-tos and problem solving, something that can be immensely handy when you need to learn something and get it done as quickly as possible! Definitely worth having for anyone currently immersed in a Big Data environment with Hadoop.
|Amazon: Hadoop in Practice|
2. MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other SystemsBy Donald Miner, Adam Shook
On the flip side of our first book, Miner and Shook’s MapReduce Design Patterns is a bit more of a dense text. It’s designed to help you understand how to model MapReduce design patterns – something handy considering that such things can be tough to find outside of a classroom or random technical blogs. Miner and Shook go into more general ideas and theories while they explain the patterns – each pattern is contextualized, and they offer a pretty good amount of pitfalls and mistakes to avoid as you model your data architecture.
The book is designed to be bit more generic as far as languages go, and so you won’t get as much immediate practical benefit out of the book as you might from some others. Where the book excels, however, is in getting you ready for designing big data models: in that case, it’s a future investment as opposed to an immediate solution. Definitely worth a buy for anyone looking to improve their knowledge on big data frameworks – even if it’s not necessarily Hadoop!
|Amazon: MapReduce Design Patterns|
3. Hadoop: The Definitive GuideBy Tom White
This is still one of the best books on Hadoop in print at the moment – Tom White’s guide is comprehensive in the utmost: it goes all the way from what Hadoop is to in-depth examinations of Hadoop’s core features and functions- it even has a great deal of design philosophy, something that’s often neglected or shoved into a separate book entirely. White shows you not only how something works but why it does as well, an invaluable teaching method.
We have more in-depth coverage of the book here (http://www.learncomputer.com/hadoop-the-definitive-guide/), but it needs to be on our top 5 list as well: Tom White’s excellent tome is invaluable for anyone looking to get into Hadoop and learning how to apply it to their own big data problem in the office!
|Amazon: Hadoop: The Definitive Guide|
4. HBase: The Definitive GuideBy Lars George
HBase is often mentioned in the same breath as Hadoop, being the database that very neatly complements Hadoop’s distributed filesystem. HBase is non-relational, and NoSQL has been on the lips of just about every IT executive in recent years due to scalability and cost-effectiveness. Lars George does a great job of giving details about HBase while also teaching you how to integrate it with MapReduce for massive big data deployment scenarios.
Like the Hadoop Definitive Guide, the HBase: The Definitive Guide pulls no punches: it’s essential reading for anyone looking to set up and deploy a Hadoop / HBase setup in a production environment. The only weakness the book has is that it’s a bit dry, but in fairness it’s tough to present such dense subject material in a lighter fashion: whether you’re simply in the market for a non-relational database or are looking to implement something as soon as possible, Lars George’s book will do you a great deal of good!
|Amazon: HBase: The Definitive Guide|
5. Big Data: Principles and best practices of scalable realtime data systemsBy Nathan Marz, James Warren
Marz and Warren’s book is quite interesting, and not least of all because Marz was one of the three original engineers behind Twitter’s BackType search engine – in “Big Data” Marz and Warren take a hard look at practical principles behind behind designing and implementing scalable real-time data systems. In particular, they do their best to teach a method of design that they call “Lambda Architecture”, a first principles approach to the scalability problem that offers interesting insights into the way Big Data should be tackled.
Of all the books on this list, Big Data is perhaps the most theoretical- but it could also be the most useful. Marz and Warren don’t get into too many specifics when it comes to database, filesystem, or language, which may not be what some people are looking for or need, but they do get into the principles you should consider when choosing your tools and implementing your system- an invaluable lesson for anyone to learn before they start a big data project.
|Amazon: Big Data: Principles and best practices of scalable realtime data systems|
Big Data is becoming something that many companies are requiring: as data sets and points grow to terabytes or even petabytes in size, sophisticated frameworks are required to be able to handle them and process them effectively. Don’t be left out of the loop when it comes to this rapidly growing part of the IT field- grab some of these books and educate yourself today!