Virtualizing Hadoop with NAS 3

Virtualizing Hadoop with NAS
A recent question in the Hortonworks Community  mentioned someone using Hadoop in a virtualized environment with EMC’s Isilon NAS (Network Attached Storage). While this may be a valid use case for some anyone who is looking at Hadoop as more than small number crunching cluster(s) will have to reflect on this approach. Here are some ... read more →

Full Metal Hadoop as a Service with Altiscale

Full Metal Hadoop as a Service with Altiscale
Hadoop, known to be powerful and challenging to manage, is increasingly becoming available as-a-Service in numerous varieties. Initially do-it-yourself distributions like Cloudera, MapR, and Hortonworks made up a great part of the market. In recent years, following the success of Amazon Web Services ElasticMapReduce (EMR), Hadoop/data services like Qubole are becoming popular. Last year, quietly, another entrant in the field ... read more →

GraphChi: How a Mac Mini outperformed a 1,636 node Hadoop cluster

GraphChi: How a Mac Mini outperformed a 1,636 node Hadoop cluster
Last year GraphChi, a spin-off of GraphLab, a distributed graph-based high performance computation framework, did something remarkable. GraphChi outperformed a 1,636 node Hadoop cluster processing a Twitter graph (dataset from 2010) with 1.5 billion edges – using a single Mac Mini. The task was triangle counting and the Hadoop cluster required over 7 hours while ... read more →

Hadoop 2.0: Beyond MapReduce with YARN, Drill, Tez

Hadoop 2.0: Beyond MapReduce with YARN, Drill, Tez
Hadoop 1.0 is increasingly challenged as slow and limited in its application, now that the hype is dying down. Marketing departments, riding the Big Data wave, wildly exaggerated Hadoop’s ability. Hadoop 2.0, surprisingly, is about to prove them somewhat right with two major developments. read more →

Hadoop cluster cost of Amazon EC2 vs EMR 10

Hadoop cluster cost of Amazon EC2 vs EMR
What is the price of a small Elastic MapReduce (EMR) vs an EC2 Hadoop cluster? This article explores the price tag of switching to a small, permanent EC2 Cloudera cluster from AWS EMR. Cloud computing with Hadoop – maybe using AWS EMR or EC2 –  makes experiments with temporary clusters and big data crunching easy and ... read more →

Democratize Big Data With Hadoop and Hive

Democratize Big Data With Hadoop and Hive
You have started to process data with cloud computing platforms like Amazon Web Service (AWS)’s Elastic MapReduce (EMR). Now that you use it regularly, other stakeholders are getting curious. You increasingly find yourself firing up an EMR cluster to quickly answer a question or try something out. It may be time to change the way ... read more →

Get Started: Big Data Crunching in the Cloud

Get Started: Big Data Crunching in the Cloud
Even if your business case is constantly evolving, you will still want to leverage big data, but being tied to a single infrastructure will limit your capital and your options. Big data is not a starting point but a destination for many startups and teams. It becomes a conversation point as a result of the ... read more →

Cloud Computing And China: Two Sides Of A Digital Bangladesh

Cloud Computing And China: Two Sides Of A Digital Bangladesh
What can the current digital Bangladesh policy, an attempt to improve connectivity, information access and literacy, achieve for Bangladesh? While the policy is well intended, in Bangladesh, like anywhere else in the world, a government can merely provide an environment for economic development. Individuals and businesses subsequently have to take advantage of emerging opportunities, some explicit and others ... read more →

Hint: Hive 0.8.1.1 on AWS – avoid a regression bug

read more →