Feed on
Posts
Comments

We recently had a situation where we needed to copy a lot of HBase data while migrating from our old datacenter to our new one. The old cluster was running Cloudera’s CDH2 with HBase 0.20.6 and the new one is running CDH3b3. Usually I would use Hadoop’s distcp utility for such a job. As it [...]

Few months ago, at Hadoop World 2010, the metrics team gave a talk on Flume + Hive integration and how we plan to integrate it with other projects. As we were nearing production date, the BuildBot/TinderBox team came with an interesting, albeit pragmatic requirement. “Flume + Hive really solves our needs, but we would ideally [...]

Exponential growth, one of the few problems every organization loves, is usually alleviated by scaling out using clustered computing (Hadoop), CDN, EC2 and myriad of other solutions. While a lot of cycles are spent in making sure each scaled out machine contains requisite libraries, latest code deployments, matching configs, and the whole nine yards, very [...]

Pentaho announced this morning that they were going to be adding some features to Pentaho Data Integration (Kettle) and to their BI suite to make it easy for people to use Kettle to retrieve, manipulate, and store data in Hadoop, and to integrate Hadoop communication into the reporting and analysis layer. They posted a nice [...]

We are marching along in our integration of HBase with the Socorro Crash Stats project, but I wanted to take a minute away from that to talk about a separate project the Metrics team has also been involved with. Mozilla Labs Test Pilot is a project to experiment and analyze data from real world Firefox [...]

I was presented with the challenge of answering the question – how many Firefox users have one add-on or more installed on their Firefox. Currently, addons.mozilla.org (AMO) has statistics on the download counts of add-ons but the actual usage of add-ons has been unanswered. The Add-ons manager inside of Firefox will check each add-on for [...]