Posts Tagged ‘hadoop’

Is Hadoop Going Away? Not Likely

June 15th, 2013 Leave a comment
Is Hadoop Going Away? Not Likely

Hadoop is a fairly new technology, but there are already some who are predicting its downfall and slide into disuse: some recent rumblings on the Internet claim that, indeed, Hadoop and HDFS are going to be going away in the near future even despite the fact that they’ve only just recently become a staple of enterprise development and deployment. The reason that Internet pundits are giving for Hadoop’s demise is the rising trend of real-time data processing, something that Hadoop doesn’t do well at all, and because of this more and more enterprises will be turning away from Hadoop and… View full post »

Best Big Data Books: Our Top 5 Choices

February 13th, 2013 Leave a comment
Best Big Data Books: Our Top 5 Choices

Big data has been a huge part of infrastructure in the past couple of years, but it’s new enough that not many people are fully versed in its intricacies. To help out in that regard, here are some of our favorite big data books that have come out recently that can help you be your office’s Hadoop Hero (or other alliterative pun!): 1. Hadoop in Practice By Alex Holmes Hadoop in Practice makes my list for Big Data because it’s not necessarily just a Hadoop manual that explains the ins and outs of Hadoop – it’s more of a guide for someone out… View full post »

Finding The Optimal Minimum Split Count For Your Hadoop Job

December 11th, 2012 Leave a comment
Finding The Optimal Minimum Split Count For Your Hadoop Job

Figuring out ways to optimize Hadoop isn’t always easy, and one part of the job that’s often overlooked is the split size / split count of a Hadoop job. Most people often leave it alone to the preset defaults, but are the preset defaults right for you? Let’s find out! When you’re looking at minimum split count, you want to look at a great deal of things: one of the most important ones is the map task capacity of the cluster for the particular job at head. Let’s say, for example, that a particular cluster has a map task capacity of… View full post »

Hadoop: The Definitive Guide Book Review

December 8th, 2012 Leave a comment
Hadoop: The Definitive Guide Book Review

By Tom White Hadoop is great idea for a framework, and it’s been one of a few game-changers in the open source world in the past few years. It’s designed to distribute processing for many large datasets across a machine cluster so that the dataset can be processed in parallel. The fact that it’s open-source and free is another bonus- there’s no cost to try out the software and see if it fits your needs, and it’s enabled many companies to sift through large datasets that they otherwise would have had to buy expensive proprietary software for. As might be expected,… View full post »

Best Hadoop Resources on the Web

December 22nd, 2011 Leave a comment
Best Hadoop Resources on the Web

Hadoop is the new word on the market, and everyone wants to leverage it in their enterprise or network. There’s so much information out there about it, however, that finding good resources becomes a challenge: Check out these Hadoop online resources for great sites that will help you know more about Hadoop and learn how to implement it in your applications and networking environment! 1. Yahoo Hadoop Tutorial Website: http://developer.yahoo.com/hadoop/tutorial/ The Yahoo Hadoop tutorial is an awesome place for anyone beginning their Hadoop adventure. It’s a full hadoop tutorial to get you up and running, including Hadoop itself, a virtual machine of a… View full post »

Hadoop: Where, Why, and is It Right for YOU?

October 12th, 2011 Leave a comment 1 comment
Hadoop: Where, Why, and is It Right for YOU?

Hadoop has become something of a buzzword in recent months; everyone and their boss is recommending it for everything, books are being written about it left and right, and there are precious few ideas or things one can mention about Hadoop without running into someone else who has other opinions about that very subject. Hadoop is regarded as the new way of doing things, and many corporations and enterprise IT departments are researching just how to fit Hadoop into their infrastructure. With all this buzz about Hadoop, you may be tempted to run off and implement it in your own organization…. View full post »

Hadoop with Hive

September 19th, 2011 (Guest Blogger)Leave a comment 10 comments
Hadoop with Hive

Nowadays, there are lots of Hadoop emerging. Indeed, by “Lots of Hadoop”, I mean companies releasing their own versions of Hadoop (e.g. Cloudera) by building a layer over the original Apache Hadoop distribution. We can also call these “customized” versions of Apache Hadoop. But when we think about the core part, it remains the same across different Hadoop flavors. Apache Software Foundation (ASF) focuses on improving Hadoop by bringing many smaller sub-projects under it to facilitate open source tools development around Hadoop. Hive happens to be one of Hadoop’s more prominent child projects. Hive is a data warehouse infrastructure, initially developed… View full post »

Hadoop Hardware Choices

August 11th, 2011 Leave a comment
Hadoop Hardware Choices

Hadoop, for those of you not in the know, is a scalable framework for performing data-intensive distributed applications. If you’re reading this article, however, you probably already know that: you’re intrigued by Hadoop’s performance potential and its proclaimed ability to run on commodity hardware. In this article, we’ll talk a bit about the hardware required to run Hadoop, and what the best configuration would be to get the most bang for your buck! First things first, however: don’t think you can just run out there and grab hundreds of bargain bin PCs and call your job done. Hadoop’s big draw is… View full post »

Hadoop Installation Tutorial

August 5th, 2011 Leave a comment 2 comments
Hadoop Installation Tutorial

Just for posterity, Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. Since you arrived to this page, I’ll assume that you have some idea of what Hadoop is and what it is used for. This tutorial will walk you through an installation of Hadoop on your workstation so you can begin exploring some of its powerful features. Hadoop has traditionally been a royal pain to setup and configure properly…. View full post »

Running Hadoop: Your First Single-Node Cluster

July 22nd, 2011 Leave a comment
Running Hadoop: Your First Single-Node Cluster

Hadoop can be a very powerful resource, if used correctly; it’s a system designed to work with a vast amount of data while simultaneously taking advantage of the hardware it’s using, low-end or not. It can be a bit difficult to set up, however; as a result, many people don’t take advantage of the system. Let’s take a look at how to set up your first single-node cluster to give you a look at how Hadoop can help your business! This tutoral uses the Apache distribution, but you can just as easily use it with Cloudera’s. NOTE: This tutorial assumes you’re… View full post »

Intro to Hadoop

June 10th, 2011 Leave a comment
Intro to Hadoop

Hadoop. A seemingly nonsensical word that keeps getting thrown around whenever you’re in a meeting. What does it mean? What does it do? Let’s read on and find out! What is Hadoop? Hadoop is a project by the Apache Foundation in order to handle large data processing jobs. It was originally conceived by Doug Cutting, the creator of Apache Lucene (Who based the name on his son’s stuffed pet elephant, incidentally). He was inspired to do so after hearing about Google’s MapReduce and GFS projects, which were Google’s way of handling very large amounts of data at once; Hadoop is an… View full post »

Best Hadoop Books: My Top 3 Choices

April 23rd, 2011 Leave a comment 1 comment
Best Hadoop Books: My Top 3 Choices

Hadoop is a buzzword that’s been thrown around all over the place: As Apache’s software framework for managing vast amounts of distributed datasets, it’s become increasingly popular in a world where the petabyte has gone from a theoretical maximum to a likely one. Hadoop’s popularity, however, doesn’t stem from its intuitive nature; to help you understand and use Hadoop effectively, here’s a list of books on the subject that are informative, clear, and helpful! Hadoop: The Definitive Guide By Tom White O’Reilly tends to be very reliable on the technical front, and this book from Tom White is no exception: it’s an informative,… View full post »

Why Hadoop?

April 26th, 2010 Leave a comment
Why Hadoop?

Hadoop is an open-source software platform by the Apache Foundation for building clusters of servers for use in distributed computing. Server clustering is really nothing new or revolutionary but Hadoop is designed specifically for mass-scale computing, which involves thousands of servers. Based on a paper originally written by Google about their MapReduce system, Hadoop leverages concepts from functional programming to solve large computing problems. Hadoop is an ideal solution for working with large volumes of data in a variety of applications from scientific to searching through web pages. Leveraging the Power of Functional Programming Functional programming is a style… View full post »