Archive for the ‘open-source’ category

Is Hadoop Going Away? Not Likely

June 15th, 2013 Leave a comment
Is Hadoop Going Away? Not Likely

Hadoop is a fairly new technology, but there are already some who are predicting its downfall and slide into disuse: some recent rumblings on the Internet claim that, indeed, Hadoop and HDFS are going to be going away in the near future even despite the fact that they’ve only just recently become a staple of enterprise development and deployment. The reason that Internet pundits are giving for Hadoop’s demise is the rising trend of real-time data processing, something that Hadoop doesn’t do well at all, and because of this more and more enterprises will be turning away from Hadoop and… View full post »

Top 5 Emerging Big Data Open-Source Technologies 2013

May 25th, 2013 Leave a comment
Top 5 Emerging Big Data Open-Source Technologies 2013

Big Data is in and of itself a new phenomenon, and keeping track of all the emergent technologies in the field can be dizzying. With that in mind, here are a few technologies to keep an eye on as they mature and evolve throughout 2013! 1. Cloudera Impala While Impala isn’t the newest on this list, it’s definitely one to watch out for in 2013. Cloudera’s implementation of real-time data processing in Hadoop is inspired- Hadoop’s disk-based storage format isn’t normally known for real-time ad-hoc query capability, but Impala uses some great open-source technologies to implement it. It uses the same metadata… View full post »

The Open-source Development Culture and Why Linux Doesn’t Work Like the Others

April 23rd, 2013 Leave a comment
The Open-source Development Culture and Why Linux Doesn't Work Like the Others

To developers moving from a relatively closed system, such as Windows or OSX environment, to the open Linux architecture, the learning curve involves far more than adding a new language or architecture to your repertoire. It usually means fundamentally changing how things are done from an engineering perspective. Coding in a restricted or closed environment, where access to the source code for related apps and the core platform itself is limited, often means making guesses as to how things will interact and spending a lot of time (and lines of code) working around those interactions or restrictions. … View full post »

6 Open Source Big Data Technologies to Keep Tabs On

April 12th, 2013 Leave a comment
6 Open Source Big Data Technologies to Keep Tabs On

It’s safe to say that ‘big data’ is the big buzzword de jour, and that is unlikely to change. Many of the major players in the tech industry are leveraging big data to spectacular and varied effect, and though it may be used a little too much as a buzzword, it is one of the most important developments in the tech industry. What’s more, the cotemporaneous rise of open source software means that many of the most exciting big data technologies are open source, with strong communities developing around many big data tools. In this article, we’ll run through a… View full post »

Finding The Optimal Minimum Split Count For Your Hadoop Job

December 11th, 2012 Leave a comment
Finding The Optimal Minimum Split Count For Your Hadoop Job

Figuring out ways to optimize Hadoop isn’t always easy, and one part of the job that’s often overlooked is the split size / split count of a Hadoop job. Most people often leave it alone to the preset defaults, but are the preset defaults right for you? Let’s find out! When you’re looking at minimum split count, you want to look at a great deal of things: one of the most important ones is the map task capacity of the cluster for the particular job at head. Let’s say, for example, that a particular cluster has a map task capacity of… View full post »

Hadoop: The Definitive Guide Book Review

December 8th, 2012 Leave a comment
Hadoop: The Definitive Guide Book Review

By Tom White Hadoop is great idea for a framework, and it’s been one of a few game-changers in the open source world in the past few years. It’s designed to distribute processing for many large datasets across a machine cluster so that the dataset can be processed in parallel. The fact that it’s open-source and free is another bonus- there’s no cost to try out the software and see if it fits your needs, and it’s enabled many companies to sift through large datasets that they otherwise would have had to buy expensive proprietary software for. As might be expected,… View full post »

HP Releases WebOS to Open Source

February 2nd, 2012 Leave a comment
HP Releases WebOS to Open Source

As recently announced, HP has began the process that will release webOS to the open-source community. This is great news for developers that have been wanting to create software for these devices and probably the best outcome that HP could have made since they were no longer going to develop the platform. This is also an opportunity to bring together Linux and other open-source mobile platforms together to join forces in a meaningful way. webOS’s open-source license is thought to be similar to the Apache foundation. HP has also consulted with Red Hat to possibly create a license… View full post »

Best Hadoop Resources on the Web

December 22nd, 2011 Leave a comment
Best Hadoop Resources on the Web

Hadoop is the new word on the market, and everyone wants to leverage it in their enterprise or network. There’s so much information out there about it, however, that finding good resources becomes a challenge: Check out these Hadoop online resources for great sites that will help you know more about Hadoop and learn how to implement it in your applications and networking environment! 1. Yahoo Hadoop Tutorial Website: http://developer.yahoo.com/hadoop/tutorial/ The Yahoo Hadoop tutorial is an awesome place for anyone beginning their Hadoop adventure. It’s a full hadoop tutorial to get you up and running, including Hadoop itself, a virtual machine of a… View full post »

Hadoop: Where, Why, and is It Right for YOU?

October 12th, 2011 Leave a comment 1 comment
Hadoop: Where, Why, and is It Right for YOU?

Hadoop has become something of a buzzword in recent months; everyone and their boss is recommending it for everything, books are being written about it left and right, and there are precious few ideas or things one can mention about Hadoop without running into someone else who has other opinions about that very subject. Hadoop is regarded as the new way of doing things, and many corporations and enterprise IT departments are researching just how to fit Hadoop into their infrastructure. With all this buzz about Hadoop, you may be tempted to run off and implement it in your own organization…. View full post »

Hadoop with Hive

September 19th, 2011 (Guest Blogger)Leave a comment 10 comments
Hadoop with Hive

Nowadays, there are lots of Hadoop emerging. Indeed, by “Lots of Hadoop”, I mean companies releasing their own versions of Hadoop (e.g. Cloudera) by building a layer over the original Apache Hadoop distribution. We can also call these “customized” versions of Apache Hadoop. But when we think about the core part, it remains the same across different Hadoop flavors. Apache Software Foundation (ASF) focuses on improving Hadoop by bringing many smaller sub-projects under it to facilitate open source tools development around Hadoop. Hive happens to be one of Hadoop’s more prominent child projects. Hive is a data warehouse infrastructure, initially developed… View full post »

Hadoop Hardware Choices

August 11th, 2011 Leave a comment
Hadoop Hardware Choices

Hadoop, for those of you not in the know, is a scalable framework for performing data-intensive distributed applications. If you’re reading this article, however, you probably already know that: you’re intrigued by Hadoop’s performance potential and its proclaimed ability to run on commodity hardware. In this article, we’ll talk a bit about the hardware required to run Hadoop, and what the best configuration would be to get the most bang for your buck! First things first, however: don’t think you can just run out there and grab hundreds of bargain bin PCs and call your job done. Hadoop’s big draw is… View full post »

Hadoop Installation Tutorial

August 5th, 2011 Leave a comment 2 comments
Hadoop Installation Tutorial

Just for posterity, Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. Since you arrived to this page, I’ll assume that you have some idea of what Hadoop is and what it is used for. This tutorial will walk you through an installation of Hadoop on your workstation so you can begin exploring some of its powerful features. Hadoop has traditionally been a royal pain to setup and configure properly…. View full post »

Using memcached to Cache Content

July 26th, 2011 Leave a comment
Using memcached to Cache Content

Web servers in the modern era are expected to do a great deal of processing and serving of all different types of content; the amount and type of content now, in fact, is far greater and more varied than it was even ten years ago. In response to these demands placed on web servers and database servers, caching systems have come into existence in order to speed up performance and help with database and web performance. memcached is a distributed memory caching system developed for use in LiveJournal’s web and database service. It is now a general-purpose, open caching server for… View full post »

Running Hadoop: Your First Single-Node Cluster

July 22nd, 2011 Leave a comment
Running Hadoop: Your First Single-Node Cluster

Hadoop can be a very powerful resource, if used correctly; it’s a system designed to work with a vast amount of data while simultaneously taking advantage of the hardware it’s using, low-end or not. It can be a bit difficult to set up, however; as a result, many people don’t take advantage of the system. Let’s take a look at how to set up your first single-node cluster to give you a look at how Hadoop can help your business! This tutoral uses the Apache distribution, but you can just as easily use it with Cloudera’s. NOTE: This tutorial assumes you’re… View full post »

Intro to Hadoop

June 10th, 2011 Leave a comment
Intro to Hadoop

Hadoop. A seemingly nonsensical word that keeps getting thrown around whenever you’re in a meeting. What does it mean? What does it do? Let’s read on and find out! What is Hadoop? Hadoop is a project by the Apache Foundation in order to handle large data processing jobs. It was originally conceived by Doug Cutting, the creator of Apache Lucene (Who based the name on his son’s stuffed pet elephant, incidentally). He was inspired to do so after hearing about Google’s MapReduce and GFS projects, which were Google’s way of handling very large amounts of data at once; Hadoop is an… View full post »