Top 5 Emerging Big Data Open-Source Technologies 2013

May 25th, 2013 Leave a comment
Like the article?
Emerging Big Data

Big Data is in and of itself a new phenomenon, and keeping track of all the emergent technologies in the field can be dizzying. With that in mind, here are a few technologies to keep an eye on as they mature and evolve throughout 2013!

1. Cloudera Impala

While Impala isn’t the newest on this list, it’s definitely one to watch out for in 2013. Cloudera’s implementation of real-time data processing in Hadoop is inspired- Hadoop’s disk-based storage format isn’t normally known for real-time ad-hoc query capability, but Impala uses some great open-source technologies to implement it. It uses the same metadata and syntax as Apache Hive, but it generates significantly less CPU load and takes better use of hardware resources than Hive generally does. Depending on configuration, Impala’s speed can be exponentially faster than Hive for the same type of use. It’s not a replacement for data warehousing, but it serves as a complement and can be used side-by-side with MapReduce and Hive.

2. Trevni

Trevni serves as a complement to Impala, and it’s extremely promising: it’s one of the newer projects out there, but it’s already causing a buzz in the big data arena. Trevni is a columnar binary storage format for Cloudera Impala, and it has quite lofty achievements: Cloudera’s Impala team hopes that once finished and properly implemented, Trevni could achieve speeds equal to those outlined in Google’s Dremel paper while actually exceeding the SQL functionality it displays. Trevni’s joining of great SQL functionality along with Dremel-like speeds definitely makes it a contender along with Impala for a tech to watch out for in 2013!

3. Spark

True to its name, Spark is a cluster computing solution that sets out to make the process of data analytics as fast as possible- both to write and to run. Spark provides primitives for performing your cluster computing in memory: your job can load the required data into memory and query it as quickly as possible, much faster than disk-based system like MapReduce and Hadoop. Part of Spark’s appeal also lies in its clean APIs in Scala, Python, and Java; you can also use it to interactively and rapidly query big datasets from the Python and Scala shells should you need to. Spark’s flexibility and speed definitely make it an open-source technology to watch out for in 2013!

4. Apache HCatalog

HCatalog is something that Big Data administrators and developers have pined for: a complete table and storage management service for Hadoop-created data. It’s possibly one of the least-talked about technologies on this list, no doubt due to its infancy: nevertheless, it’s most definitely worth watching out for. As a metadata management model that keeps to its open-source philosophy and works across all of your data, it’s been hailed as a godsend from some beleaguered Big Data proponents. It’s a specific problem that nonetheless needs a strong solution, and HCatalog is shaping up to be that great solution for metadata management in HDFS.

5. Data-Driven Documents

Data-Driven Documents, or D3 for short, is a JavaScript visualization library that’s been receiving some acclaim as of late. It helps you visualize data and bring it to life using SVG, HTML, and CSS. It’s interesting because it doesn’t bind you to a proprietary framework or try to solve everything: instead, it simply gives you quite powerful visualization components as well as DOM manipulation with a data-driven approach. Essentially, it solves the core of the problem: the efficient manipulation of documents based on the data. Definitely worth looking out for as it evolves in 2013!

Conclusion

2013 is an exciting time for big data, especially considering the shift to real time data processing in combination with the considerable data warehousing abilities of current Hadoop deployments. Make sure to look at these open source technologies for 2013- who knows, they may even be useful for your enterprise deployments in the next year!

Help us spread the word!
  • Twitter
  • Facebook
  • LinkedIn
  • Pinterest
  • Delicious
  • DZone
  • Reddit
  • Sphinn
  • StumbleUpon
  • Google Plus
  • RSS
  • Email
  • Print
Don't miss another post! Receive updates via email!

Comment