Feed on
Posts
Comments

Pentaho announced this morning that they were going to be adding some features to Pentaho Data Integration (Kettle) and to their BI suite to make it easy for people to use Kettle to retrieve, manipulate, and store data in Hadoop, and to integrate Hadoop communication into the reporting and analysis layer. They posted a nice [...]

Update: Bugzilla SQR

I have had the chance to improve the bugzilla SQR in many ways. I have improved the overall run time inside of the ETL (both in kettle and in a python script), fixed a few bugs (A major one that was causing problem with the Open Bug Count), added new dimensions, and constructed a few [...]

Shell script analytics

Recently, I was asked if I could provide a breakdown of Firefox users on the Macintosh platform by whether they were using the Intel or PPC chipset. For anyone who only cares about seeing that data and not the “how” behind it, look no further than this link:
Firefox on Macintosh processor breakdown trends in Many Eyes

For anyone else, what follows is a detailed post about the volume of some of the data we parse, and some helpful AWK scripts that I use to parse it at times.

There is a lot to be said on the above topic, but for the moment, I just wanted to drop a quick note about some ad-hoc work I did today: I ran an analysis on a year and a half of FTP log files, filtering for some specific requests, and filtering out but summarizing uninteresting [...]

Mozilla’s bugzilla database contains approx. 480,000 bugs and approx. 5,000,000 entries in bugs_activity table and is too large for the initial development that I am doing. I want to construct a smaller sample Bugzilla data base that I can use to develop and run tests with for my project in a more efficient manner. To [...]

I am working on a project this summer that analyzes Bugzilla. The basis of this project has been started by Nick Goodman and he entitled it Software Quality Reports (SQR). Software Quality Reports gives product managers, project managers, development managers, and software engineers more information on things like bug burn down rate by product, issues [...]