32x Faster Hadoop and MapReduce With Indexing 8

32x Faster Hadoop and MapReduce With Indexing
Hadoop and map reduce’s simplicity, and especially lack of indices, significantly limits its performance. I described how map reduce 2.0 and alternatives bypassing map reduce will change Hadoop’s application and speed it up in the next year or two. Another approach is the introduction of indices to data stored on Hadoop Distributed File System (HDFS). At its inception, ... read more →

Hadoop 2.0: Beyond MapReduce with YARN, Drill, Tez

Hadoop 2.0: Beyond MapReduce with YARN, Drill, Tez
Hadoop 1.0 is increasingly challenged as slow and limited in its application, now that the hype is dying down. Marketing departments, riding the Big Data wave, wildly exaggerated Hadoop’s ability. Hadoop 2.0, surprisingly, is about to prove them somewhat right with two major developments. read more →

Democratize Big Data With Hadoop and Hive

Democratize Big Data With Hadoop and Hive
You have started to process data with cloud computing platforms like Amazon Web Service (AWS)’s Elastic MapReduce (EMR). Now that you use it regularly, other stakeholders are getting curious. You increasingly find yourself firing up an EMR cluster to quickly answer a question or try something out. It may be time to change the way ... read more →

Get Started: Big Data Crunching in the Cloud

Get Started: Big Data Crunching in the Cloud
Even if your business case is constantly evolving, you will still want to leverage big data, but being tied to a single infrastructure will limit your capital and your options. Big data is not a starting point but a destination for many startups and teams. It becomes a conversation point as a result of the ... read more →