Sep 17
Cleaning up the Graph Server Database
The Graph Server, your one stop shop for all things performance, has gotten a little unwieldy around the database. The underlying tables have recently passed a milestone – a billion rows. The raw disk footprint is 108GB as of today (and it’s MyISAM even), making it – by an order of magnitude – the largest database in the Mozilla MySQL infrastructure.
This database currently stores two types of data – the raw per-run data and the aggregate values. For example, when one of the test machines runs through a test, it might run the same test 20 times in a row. Each run is timed and recorded and stored in the database. An aggregate value is then calculated and that is also recorded. We generally use the aggregate data to graph things over time to answer questions or see how certain code changes have impacted the speed/memory usage/etc of Firefox.
Sometimes it is useful to look at individual runs, but by and large the data that is really useful is the aggregates. As such, we’ve decided to purge older per-run data. Starting tomorrow, we will begin to remove individual run data points that are older than 60 days. This allows us to keep the in-depth information from the past two months if we need to look into how an aggregate data point was calculated, but for things older than that, we will need to rely on the aggregate value alone.
And of course, we have backups of the data being removed. If anything goes awry or if we or someone determines that we need this data to be available, we can restore it.
If you have any particular use case or something that we should know about before removing this data, please let me or a member of the Release Engineering team know. Thank you!
1 comment1 Comment so far
Leave a comment
Mark,
Infobright has a columnar datastore that builds on top of MySQL. They just released a community edition (http://www.infobright.org/) that makes me feel it would be worth taking another look at it. Maybe we could get together and chat about big DB management when I’m there at the beginning of Oct.
Also, graph server may be the biggest MySQL or row based datastore, but the metrics cluster still has it beat on volume of data stored (although in much less HD storage cost).