<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mozilla Web Development &#187; Socorro</title>
	<atom:link href="http://blog.mozilla.com/webdev/category/soccoro/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mozilla.com/webdev</link>
	<description>Everybody Likes Ninjas</description>
	<lastBuildDate>Sat, 21 Nov 2009 05:50:58 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Socorro Moves to New Hardware</title>
		<link>http://blog.mozilla.com/webdev/2009/05/15/socorro-moves-to-new-hardware/</link>
		<comments>http://blog.mozilla.com/webdev/2009/05/15/socorro-moves-to-new-hardware/#comments</comments>
		<pubDate>Fri, 15 May 2009 20:49:58 +0000</pubDate>
		<dc:creator>K Lars Lohn</dc:creator>
				<category><![CDATA[Socorro]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/webdev/?p=414</guid>
		<description><![CDATA[What has two quad core 3GHz 64bit CPUs, sixteen gigs of RAM and makes the Socorro users happy?  That would be the new hardware that the Socorro system moved to during a six hour operation on Thursday night.  The new hardware was recommended by the folks from the aptly named PostgreSQL Experts, Inc [...]]]></description>
			<content:encoded><![CDATA[<p>What has two quad core 3GHz 64bit CPUs, sixteen gigs of RAM and makes the Socorro users happy?  That would be the new hardware that the Socorro system moved to during a six hour operation on Thursday night.  The new hardware was recommended by the folks from the aptly named <a href="http://pgexperts.com">PostgreSQL Experts, Inc</a> after an intense week of consultation and analysis in March earlier this year.  After auditing our existing system of hardware and software, it was apparent that we were woefully underpowered for what we were trying to do.  While simply tuning PostgreSQL helped in the interim, a more powerful platform was clearly in order.</p>
<p>Before we deployed the new hardware, we had to take several steps to tame our voracious use of disk space.  In the previous week, we removed the archived dumps from the database.  They were rarely ever accessed but took up the lion&#8217;s share of our disk space.  By migrating them to file system storage, we made a three hundred gig database migration onto new hardware into a migration of only sixty gig.  </p>
<p>While there may be a need for tuning over the next week, Socorro users should have a much accelerated experience using the Socorro Web site.</p>
<p>Many thanks to <em>aravind</em> for shepherding this project through IT, <em>chizu</em> in IT for his ﻿db cloning/replication scripting/tweaking and <em>jberkus</em> from PostgreSQL Experts for his superior navigation skills and a steady hand at the PostgreSQL tiller.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/webdev/2009/05/15/socorro-moves-to-new-hardware/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Socorro Dumps Wave Good-bye to the Relational Database</title>
		<link>http://blog.mozilla.com/webdev/2009/04/20/socorro-dumps-wave-good-bye-to-the-relational-database/</link>
		<comments>http://blog.mozilla.com/webdev/2009/04/20/socorro-dumps-wave-good-bye-to-the-relational-database/#comments</comments>
		<pubDate>Mon, 20 Apr 2009 23:54:02 +0000</pubDate>
		<dc:creator>K Lars Lohn</dc:creator>
				<category><![CDATA[Socorro]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/webdev/?p=365</guid>
		<description><![CDATA[Let&#8217;s say we&#8217;ve got some twenty-five million chunks of data ranging in size from one K to several meg.  Let&#8217;s also say that we only rarely ever need to access this data, but when we do, we need it fast.  Would your first choice be to save this data in a relational database?
That&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s say we&#8217;ve got some twenty-five million chunks of data ranging in size from one K to several meg.  Let&#8217;s also say that we only rarely ever need to access this data, but when we do, we need it fast.  Would your first choice be to save this data in a relational database?</p>
<p>That&#8217;s the situation that we&#8217;ve got in Socorro right now.  Each time we catch a crash coming in from the field, we process it and save a &#8220;cooked&#8221; version of the dump in the database.  We also save some details about the crash in other tables so that we can generate some aggregate statistics. </p>
<p>It&#8217;s that cooked dump that&#8217;s causing some concern.  The only time that we ever access that data is when someone requests that specific crash using the Socorro UI.  Considering that these cooked crashes take up nearly three quarters of the storage needs of our database, there&#8217;s not a lot of value there for the effort.  They inflate the hardware requirements for our database, make backups take too long and complicate any future database replication plans that we might consider.</p>
<p>We&#8217;re about to migrate our instance of Socorro to new shiny 64bit hardware.  Moving these great drifts of cooked dumps would take hours and necessitate potentially more than a day of down time for  production.  We don&#8217;t want that.</p>
<p>It&#8217;s time for a great migration.  All those dumps are going to leave the database.  We&#8217;re spooling them out into a file system storage scheme.  At the same time, we&#8217;re reformatting them into JSON.  In the next version of Socorro, when a user requests their dump by UUID, it will be served by Apache directly from a file system as a compressed JSON file.  The client will decompress it and through javascript magic give the same display that we&#8217;ve got now.</p>
<p>There&#8217;s some future benefits to moving this data into a file system format.  Think about all of this data sitting there in a Hadoop friendly format waiting for a future data mining project.  We&#8217;ve nothing specific planned, but we&#8217;ve got the first step done.</p>
<p>We&#8217;re hoping to get the data migration done within the week.  New versions of the processing programs will have to be deployed as well as the changes to the Web application.  Once that&#8217;s done, we can proceed to the deployment of our fancy new hardware.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/webdev/2009/04/20/socorro-dumps-wave-good-bye-to-the-relational-database/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Crash Reporter Homepage Reskin</title>
		<link>http://blog.mozilla.com/webdev/2009/03/02/crash-reporter-homepage-reskin/</link>
		<comments>http://blog.mozilla.com/webdev/2009/03/02/crash-reporter-homepage-reskin/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 20:02:50 +0000</pubDate>
		<dc:creator>ozten</dc:creator>
				<category><![CDATA[Socorro]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/webdev/?p=309</guid>
		<description><![CDATA[The crash reporter has been given a new look, and the homepage has a new Dashboard.

Our UX Engineer Neil Lee has applied some simplifications to the query form. This redesign was focused on the homepage and global navigation.
Another new feature is that MTBF and Top Crashers By Signature can be exported in CSV format. In [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://crash-stats.mozilla.com/">crash reporter</a> has been given a new look, and the homepage has a new <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=465660">Dashboard</a>.</p>
<p><img src="http://blog.mozilla.com/webdev/files/2009/03/crash_reporter_homepage.png" alt="Screenshot of Crash Reporter homepage" width="500" height="269" /></p>
<p>Our UX Engineer Neil Lee has applied some simplifications to the query form. This redesign was focused on the homepage and global navigation.</p>
<p>Another new feature is that MTBF and Top Crashers By Signature can be <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=478305">exported in CSV format</a>. In the future, as we want to slice and dice different reports, it should be trivial to add this feature to other reports.</p>
<p>In addition I&#8217;ve fixed a handful of issues:</p>
<ul>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=428110">428110</a> &#8211; Quick and dirty changes to speed up crash analysis</li>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=478043">478043</a> &#8211; Make &#8216;is exactly&#8217; the default choice</li>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=479256">479256</a> &#8211; Clarify labels to be Date Processed</li>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=470524">470524</a> &#8211; Crash signatures not indented</li>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=479460">479460</a> &#8211; Bad Unicode in User Comments</li>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=479447">479447</a> &#8211; report/list with no results has JS error</li>
</ul>
<p>We would love your feedback. Check out some <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=480617">recently filed bugs</a> or send us some feedback and <a href="https://bugzilla.mozilla.org/enter_bug.cgi?product=Webtools&amp;component=Socorro">file a new Socorro bug.</a></p>
<p>We&#8217;ve had some known issues around MTBF and Top Crashes by Signature in the last month and are working on fixing these issues. The upside is that <a href="http://crash-stats.mozilla.com/mtbf/of/SeaMonkey/development">SeaMonkey is now in MTBF</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/webdev/2009/03/02/crash-reporter-homepage-reskin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Socorro Partitioning Rolled Back</title>
		<link>http://blog.mozilla.com/webdev/2009/02/02/socorro-partitioning-rolled-back/</link>
		<comments>http://blog.mozilla.com/webdev/2009/02/02/socorro-partitioning-rolled-back/#comments</comments>
		<pubDate>Mon, 02 Feb 2009 18:29:15 +0000</pubDate>
		<dc:creator>morgamic</dc:creator>
				<category><![CDATA[Socorro]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/webdev/?p=179</guid>
		<description><![CDATA[This Thursday and Friday we attempted to push updates to re-partition our crash report database and optimize the reporting tool to take advantage of it.  This was the deployment of bug 432450 and a fix for bug 444749, among others.
Our first attempt suffered from a network timeout, which required an eleven hour restore and re-run.  [...]]]></description>
			<content:encoded><![CDATA[<p>This Thursday and Friday we attempted to push <a href="http://blog.mozilla.com/webdev/2009/01/20/socorro-database-partitioning-is-coming/">updates to re-partition our crash report database</a> and optimize the reporting tool to take advantage of it.  This was the deployment of <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=432450">bug 432450</a> and a fix for <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=444749">bug 444749</a>, among others.</p>
<p>Our first attempt suffered from a network timeout, which required an eleven hour restore and re-run.  The re-run, done Friday, was done using a socket connection but would have required an additional 1-3 days of downtime, which was well outside our originally announced window.  Consequently, the database was rolled back to its contents as of 6:55PM PDT, January 29.  Reports have since resumed processing.</p>
<p>We plan on doing the following:</p>
<ul>
<li>Set up a complete replica of production to test this process end-to-end.  Our dry runs were done on a staging database that was roughly 1/5 the size.  We anticipated a scaling of O(n), but in practice on the production server, we got performance more inline with O(n^2).   So we did not expect the full extent of timeouts or how much downtime would be needed.  This will be avoided in future updates and we are setting up a stage database from a recent dump (once we gather the hardware for it).</li>
<li>Push a now+ partitioning script.  The work done in <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=432450">bug 432450</a>, on top of a complex migration script for old data, has logic for handling new partitions automatically which benefits new reports.  Since we don&#8217;t want to keep adding to our old database schema, we will push these updates so that new reports are properly partitioned.  Pros &#8211; in a week or two, things will be speedy and we aren&#8217;t going to struggle with timeouts.  Cons &#8211; we aren&#8217;t migrating the last 4 weeks.  We will not see a performance increase when querying data older than the date of the repartitioning.</li>
</ul>
<p>We would like to push the partitioning script (without migration of old data) on Thursday.  We will announce when it will be as soon as we know.</p>
<p>Long term, we are already in the process of seeking additional resources to help examine our database configuration and systems architecture.  We will have more updates on that process in the future.</p>
<p>Our team wants this work deployed as much as everyone else.  Thanks to everyone for their patience as we work through these issues.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/webdev/2009/02/02/socorro-partitioning-rolled-back/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Socorro Database Partitioning is Coming</title>
		<link>http://blog.mozilla.com/webdev/2009/01/20/socorro-database-partitioning-is-coming/</link>
		<comments>http://blog.mozilla.com/webdev/2009/01/20/socorro-database-partitioning-is-coming/#comments</comments>
		<pubDate>Tue, 20 Jan 2009 21:20:20 +0000</pubDate>
		<dc:creator>K Lars Lohn</dc:creator>
				<category><![CDATA[Socorro]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/webdev/?p=154</guid>
		<description><![CDATA[How big can a table in a database get?  Well, the answer varies by database, for most modern databases, the answer is &#8220;really huge&#8221;.  That&#8217;s what we&#8217;ve going in Socorro, some honking big tables.  Queries can get slow on big tables.  Sure you can add indexing to prevent having to scan [...]]]></description>
			<content:encoded><![CDATA[<p>How big can a table in a database get?  Well, the answer varies by database, for most modern databases, the answer is &#8220;really huge&#8221;.  That&#8217;s what we&#8217;ve going in Socorro, some honking big tables.  Queries can get slow on big tables.  Sure you can add indexing to prevent having to scan the whole thing for the most common queries, but you can&#8217;t index every column without slowing performance elsewhere and using up ever more disk space.  Indexes can get really huge, too.</p>
<ul>
<li>tables divided into sub tables</li>
<li>queries optimized to use smaller subtables</li>
<li>ad hoc queries sped up</li>
<li>summary queries ignore irrelevant data</li>
<li>smaller indexes</li>
<li>simplified data retirement</li>
<li><em>fresh lemon scent</em>*</li>
</ul>
<p>Partitioning, a feature offered by many RDMSs, is a trick to help manage titanic tables.  You take a table and break it up into smaller tables, each containing a part of the whole.   Say we have a data set called &#8216;reports&#8217;.  Rather than storing them all in one table, we store them in several smaller tables.  The data in each smaller table share a common trait.  For example, for reports from Week 1, Week 2, and Week 3 would each have their own table.  The master table &#8216;reports&#8217; physically has no data in it at all.  The database knows that when we reference &#8216;reports&#8217; it means the union of all the smaller tables.  If we do a query on the &#8216;reports&#8217; table and we ask for reports from January, the database is clever enough to just look to the weekly sub-tables for January instead of looking at all the sub-tables.</p>
<p><img src="http://people.mozilla.com/~lars/socorro/images/db.partitioning.png" alt="the reports table divided into sub tables by week" /></p>
<p>We&#8217;re currently converting the Socorro database into a partitioned database (okay, technically it was already partitioned, but its partitioning was degenerate and didn&#8217;t work properly).  We&#8217;re testing a Python script that is going to take the &#8216;reports&#8217; table, along with the associated &#8216;frames&#8217; and &#8216;dumps&#8217; tables, and start breaking them into little one week chunks. Unfortunately, because of the massive size of the tables, we cannot afford to have two copies of the data in the database at the same time.  The chunking of the data will be destructive.  After a week of data is copied into a new partition, that corresponding week of data will be deleted from the original table.</p>
<p>The database, the Socorro Web App, breakpad crash processing and aggregate analysis will be down during the migration process.  However, <strong>data collection will not be down: we&#8217;re not going to lose new crash data</strong>.</p>
<p>If we were to chunk the entire dataset, the migration process is estimated to take more than twenty hours.  As a compromise, we&#8217;re going to chunk only the most recent four weeks of data and leave the rest as a single oversize partition.  This will significantly reduce the time that migration takes and, therefore reduce the down time.  We can get away with this because most of the aggregate reports only look at the most recent few weeks of data anyway.</p>
<p>Another advantage to partitioning is in the retirement of old data.  In the future, we&#8217;re probably only going to keep at most one hundred twenty days of history.  Any more than that and our storage needs would require its own building.  To get rid of the old data, all we need to do is delete the oldest partitions.  That action is fast because it doesn&#8217;t even require looking at indexes or scanning tables.</p>
<p>Partitioning is going to allow Socorro to scale much more smoothly.  At the same time, it will make our aggregate reporting much more efficient.</p>
<p>This repartitioning process will happen within the next week.  We will announce the scheduled down time in advance.  And be assured, because of the file system changes what we made during our last big Socorro update, data collection will <strong>not</strong> be down while we&#8217;re repartitioning.</p>
<p>* <em>also available in fresh wintergreen and sparkling pumpkin</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/webdev/2009/01/20/socorro-database-partitioning-is-coming/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Top Crashers By Url and MTBF</title>
		<link>http://blog.mozilla.com/webdev/2008/12/30/top-crashers-by-url-and-mtbf/</link>
		<comments>http://blog.mozilla.com/webdev/2008/12/30/top-crashers-by-url-and-mtbf/#comments</comments>
		<pubDate>Tue, 30 Dec 2008 23:06:03 +0000</pubDate>
		<dc:creator>ozten</dc:creator>
				<category><![CDATA[Socorro]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/webdev/?p=133</guid>
		<description><![CDATA[ 





Working with ss and chofman, we&#8217;ve created 2 new types of reports: a Top Crashers by Url and a Mean Time Before Failure (MTBF).
 
Given the current state of performance of the non-report parts of Socorro&#8217;s webapp, most of the thought and time have gone into the backend piece of these reports. You can read about the ReportDatabaseDesign on the project&#8217;s wiki.
Top [...]]]></description>
			<content:encoded><![CDATA[<p> </p>
<div class="mceTemp">
<dl id="attachment_138" class="wp-caption alignleft" style="width: 160px;">
<dt class="wp-caption-dt"><a href="http://blog.mozilla.com/webdev/files/2008/12/crashreporter-512.png"><img class="size-thumbnail wp-image-138 " title="Crash Reporter Icon" src="http://blog.mozilla.com/webdev/files/2008/12/crashreporter-512-150x150.png" alt="Sofa's Crash Reporter Icon via Alex Faaborg" width="150" height="150" /></a></dt>
</dl>
</div>
<p>Working with ss and chofman, we&#8217;ve created 2 new types of reports: a Top Crashers by Url and a Mean Time Before Failure (MTBF).</p>
<p> </p>
<p>Given the current state of performance of the non-report parts of Socorro&#8217;s webapp, most of the thought and time have gone into the backend piece of these reports. You can read about the <a title="More info on Report Database Design on wiki" href="http://code.google.com/p/socorro/wiki/ReportDatabaseDesign">ReportDatabaseDesign</a> on the project&#8217;s wiki.</p>
<h3>Top Crashers by URL</h3>
<blockquote><p>On which websites do our browser builds crash the most? Which curses do our users hurl at us when this happens?</p></blockquote>
<p>This report uses the optional url feild of a crash report to answer this question. It has two modes <strong>byurl</strong> and <strong>bydomain</strong>. You can read more about the details on <a title="Details of Top Crashers By Url report" href="http://code.google.com/p/socorro/wiki/TopCrashersByUrl">TopCrashersByUrl</a>. Crashes which have a comment, include the comment and a link to the actual crash report. Don&#8217;t worry, personal details have been removed, we don&#8217;t tie a specific user to a specific url.</p>
<p>We will be putting links into Socorro to these new reports, with the work neilio is doing, but for now here are various links.</p>
<p>We&#8217;ve enabled top crashers by URL for Firefox <a href="http://crash-stats.mozilla.com/topcrasher/byurl/Firefox/3.0.5">3.0.5</a>, <a href="http://crash-stats.mozilla.com/topcrasher/byurl/Firefox/3.1b2">3.1b2</a>, <a href="http://crash-stats.mozilla.com/topcrasher/byurl/Firefox/3.1b3pre">3.1b3pre</a>, and <a href="http://crash-stats.mozilla.com/topcrasher/byurl/Firefox/3.0.6pre">3.0.6pre</a>. Each of these link to &#8220;by domain&#8221; breakdowns, so 3.0.6pre has a link to this <a href="http://crash-stats.mozilla.com/topcrasher/bydomain/Firefox/3.0.6pre">by domains</a> view.</p>
<h3>MTBF</h3>
<blockquote><p>Is this new release more crashy than previous releases?</p></blockquote>
<p>Squeaking in before New Year&#8217;s Eve&#8217;s MFBT comes the MTBF report. It is a graph of the average number of seconds a release runs before crashing. Details are at <a href="http://code.google.com/p/socorro/wiki/MeanTimeBeforeFailure">MeanTimeBeforeFailure</a> on the wiki.</p>
<p>We&#8217;re running MTBF reports for 14 releases:</p>
<p>Firefox <a href="http://crash-stats.mozilla.com/mtbf/of/Firefox/major">major</a>, <a href="http://crash-stats.mozilla.com/mtbf/of/Firefox/milestone">milestone</a>, and <a href="http://crash-stats.mozilla.com/mtbf/of/Firefox/development">development</a> releases.</p>
<p>Thunderbird <a href="http://crash-stats.mozilla.com/mtbf/of/Thunderbird/milestone">milestone</a>, and <a href="http://crash-stats.mozilla.com/mtbf/of/Thunderbird/development">development</a> releases. (No Milestone releases in Socorro yet)</p>
<p>Coming Soon: <strong>SeaMonkey</strong></p>
<p>These reports are for a release in general as well as stats for Mac and Win, allowing for drilling down into OS. Several frontend enhancements to this report are coming.</p>
<p>, Product and versions in these reports include:</p>
<ul>
<li>Firefox 3.0.4</li>
<li>Firefox 3.0.5  </li>
<li>Firefox 3.1a2</li>
<li>Firefox 3.1b1</li>
<li>Firefox 3.1b2 </li>
<li>Firefox 3.0.4pre</li>
<li>Firefox 3.0.5pre</li>
<li>Firefox 3.0.6pre</li>
<li>Firefox 3.1b3pre</li>
<li>Firefox 3.1b2pre</li>
<li>Thunderbird 3.0a3</li>
</ul>
<ul>
<li>Thunderbird 3.0b1</li>
<li>Thunderbird 3.0b1pre</li>
<li>Thunderbird 3.0b2pre</li>
</ul>
<p>I&#8217;ve gotten a good dose of feedback on tweaks to make and bugs to fix, but hopefully you&#8217;ll find these new reports useful. Tomcat has already mentioned augmenting his list of urls to populate his test automation for 3.1 (using spider to test most popular urls) with the urls in these reports.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/webdev/2008/12/30/top-crashers-by-url-and-mtbf/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Socorro wireframes</title>
		<link>http://blog.mozilla.com/webdev/2008/12/04/socorro-wireframes/</link>
		<comments>http://blog.mozilla.com/webdev/2008/12/04/socorro-wireframes/#comments</comments>
		<pubDate>Thu, 04 Dec 2008 17:52:13 +0000</pubDate>
		<dc:creator>nlee</dc:creator>
				<category><![CDATA[Socorro]]></category>
		<category><![CDATA[Design]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/webdev/?p=122</guid>
		<description><![CDATA[As part of our ongoing work on the Mozilla crash reporting system (codenamed &#8220;Socorro&#8221;) a redesign of the entire interface is in the works, and I have some preliminary wireframes to share for feedback and discussion.
My personal goals with this redesign are to make working with crash data more efficient, and to make each screen [...]]]></description>
			<content:encoded><![CDATA[<p>As part of our ongoing work on the Mozilla crash reporting system (codenamed &#8220;Socorro&#8221;) a redesign of the entire interface is in the works, and I have some preliminary wireframes to share for feedback and discussion.</p>
<p>My personal goals with this redesign are to make working with crash data more efficient, and to make each screen as useful and as intuitive as possible. This project is an interesting challenge, however, as there are not a lot of publicly-available examples of crash reporting systems to use as a baseline.</p>
<h3>&#8220;Home&#8221; page</h3>
<p>Current page: <a href="http://crash-stats.mozilla.com/">http://crash-stats.mozilla.com/</a></p>
<p>Currently when you go to the <a href="http://crash-stats.mozilla.com/">Socorro home page</a> you&#8217;re dumped right into search. This makes a lot of sense, but there could be a lot more useful data right up front. This wireframe tries to incorporate more of a &#8220;dashboard&#8221; approach with top crashers for release versions and other pertinent information.</p>
<div style="text-align:center"><a href="http://blog.mozilla.com/webdev/files/2008/12/search-basic-full.png"><img src="http://blog.mozilla.com/webdev/files/2008/12/search-basic1.png" alt="search-basic.png" border="0" width="450" /></a></p>
<p><strong>Home page, default configuration</strong></div>
<div style="text-align:center"><a href="http://blog.mozilla.com/webdev/files/2008/12/search-full-full.png"><img src="http://blog.mozilla.com/webdev/files/2008/12/search-full1.png" alt="search-full.png" border="0" width="450" /></a></p>
<p><strong>Home page with advanced filters visible</strong></p>
</div>
<p>Some key changes in this wireframe:</p>
<ol>
<li>Search is now called &#8220;Filter&#8221; as that&#8217;s really what you&#8217;re doing.</li>
<li>The filter options have been cleaned up a bit, and more specific filters are now grouped under <em>Advanced</em> and hidden by default.</li>
<li>The Advanced Filters toggle will remember if it was opened or closed, so if you regularly access any of these options they will be easily accessible.</li>
<li>There is a new &#8220;top crashers&#8221; widget that shows the top 3-5 reported crashes for current release versions.</li>
<li>The boxes underneath the filter / top crashers widget are for other chunks of data such as top crashers for development versions, mean time before failure, top URLs that cause crashes, etc.</li>
<li>The <strong>versions</strong> filter auto-fills with just the versions available for the selected product, to help keep the number of options down.</li>
<li>The &#8220;Mozilla Developers&#8221; button at the top right-hand corner opens a jump menu that lists all of the Mozilla developer web sites / tools. I think it&#8217;s kind of silly the various Mozilla developer sites aren&#8217;t linked together and this navigational tool addresses that deficiency.</li>
</ol>
<h3>Top Crashers</h3>
<p>Current page: <a href="http://crash-stats.mozilla.com/topcrasher">http://crash-stats.mozilla.com/topcrasher</a></p>
<div style="text-align:center"><a href="http://blog.mozilla.com/webdev/files/2008/12/topcrashers-full.png"><img src="http://blog.mozilla.com/webdev/files/2008/12/topcrashers.png" alt="topcrashers.png" border="0" width="450" /></a></div>
<p>At the moment the existing page is nothing more than a jumping point to link you to the various product versions. The redesigned wireframe tries to float up crash report information for more commonly used product versions while whittling the number of versions down to a more sensible number.</p>
<h3>Individual Crash Signatures</h3>
<p>Example: <a href="http://crash-stats.mozilla.com/report/list?product=Firefox&amp;version=Firefox%3A3.1b2pre&amp;query_search=signature&amp;query_type=contains&amp;query=&amp;date=&amp;range_value=1&amp;range_unit=weeks&amp;do_query=1&amp;signature=mozcrt19.dll%400x1838a">Crash reports in mozcrt19.dll@0&#215;1838a</a></p>
<div style="text-align:center"><a href="http://blog.mozilla.com/webdev/files/2008/12/individual-signature-full.png"><img src="http://blog.mozilla.com/webdev/files/2008/12/individual-signature2.png" alt="individual-signature.png" border="0" width="450" /></a></div>
<p>The redesign does away with the tabbed interface currently in use and brings everything onto one page for quicker access. The top left-hand box displays either the number of crashes by operating system (for release versions) or the number of crashes by build (for development versions).</p>
<h3>Give us your comments, your feedback, your huddled criticism</h3>
<p>As I mentioned these are quite preliminary and I&#8217;m very interested to hear your comments and whether you feel these wireframes are headed in the right direction.</p>
<p>Redesigning a tool such as Socorro is rather challenging as there aren&#8217;t many existing examples of good design in this area (or many publicly available examples at all for that matter). I also had to keep in mind that whatever is created needs to work for both Mozilla&#8217;s specific requirements as well as be generic enough to be adaptable by others, which makes this redesign even more tricky.</p>
<p>I spent quite a bit of time looking at similar or parallel systems such as network and web hosting dashboards to get some ideas for how this information could be displayed and these wireframes incorporate some of my discoveries.</p>
<p>Beefs, bouquets, comments, suggestions?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/webdev/2008/12/04/socorro-wireframes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Three Weeks with the New Socorro File System</title>
		<link>http://blog.mozilla.com/webdev/2008/12/01/three-weeks-with-the-new-socorro-file-system/</link>
		<comments>http://blog.mozilla.com/webdev/2008/12/01/three-weeks-with-the-new-socorro-file-system/#comments</comments>
		<pubDate>Mon, 01 Dec 2008 17:12:11 +0000</pubDate>
		<dc:creator>K Lars Lohn</dc:creator>
				<category><![CDATA[Breakpad]]></category>
		<category><![CDATA[Socorro]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/webdev/?p=107</guid>
		<description><![CDATA[Three weeks ago today, we deployed the new Socorro file system into production.  It was the first in in a series of engineered improvements to the Socorro codebase.  By “engineered”, I mean that it was the first major improvement to the code that wasn&#8217;t done during an emergency with a gun to our [...]]]></description>
			<content:encoded><![CDATA[<p>Three weeks ago today, we deployed the new Socorro file system into production.  It was the first in in a series of engineered improvements to the Socorro codebase.  By “engineered”, I mean that it was the first major improvement to the code that wasn&#8217;t done during an emergency with a gun to our heads.  For the previous half year, we&#8217;ve been reactive instead of proactive. </p>
<p>The new file system has performed quite well.  The most outward expression of this improvement is the speed at which priority jobs are processed.  </p>
<p>A priority job is any submitted crash for which someone has requested a report.  There can be a backlog of submitted crashes and it might take from several minutes to several hours for the processing programs to get around to a particular job.  If someone requests a particular crash, we&#8217;ve got a way for that job to jump the queue for immediate processing.   Prior to the new file system, the biggest hurdle to processing a job quickly was simply finding it.   There was no index to assist in find a job quickly.</p>
<p>The new file system changed that.  All entries are indexed as they&#8217;re inserted.  To see how it&#8217;s done, see my previous blog posting.  This gives us very fast access to any crash dump which translates to response times of thirty to ninety seconds for priority job requests.  Try it.  Considering the volume of crashes we get, it&#8217;s amazing that we can zero in and process a crash so quickly.</p>
<p>The last two weeks hasn&#8217;t all been champagne and fireworks.  We had a scare about forty eight hours after deployment.  The automatic indexing scheme uses a radix algorithm to spread crash dumps evenly through a branching file system structure.  During design, we chose to make this structure four levels deep.  Each level did a 256 way bifurcation of the directory tree.  That translates into 256^4 possible directories or about 4.3 billion.  Once a directory was created, we never retired it, thinking that it would be faster to reuse old directories than to bother destroying and creating them all the time.  At the rate that we received new files, we calculated that it would take years to clog up the file system.  We banked on the assumption that we had at least 4.3 billion inodes available in the file system.</p>
<p>It was a bad assumption.  It turns out that we&#8217;re using some sort of  black box storage systems with variable sized inodes.  We didn&#8217;t have 4.3G inodes available, we had only 64M. Back into reactive coding as performance art, we took twenty four hours to brainstorm, code, and deploy a solution.  Changing the number of levels from four to three was an obvious way to reduce our foot print: 256^3 is only 16M.  The number of levels of our radix directory structure is now a configuration option.  The trick was making four days of data stored with four levels compatible with new data being collected with fewer levels.  I managed that by encoding the number of levels into uuid of each crash.</p>
<p>Next time you see a crash uuid, take a look at the digits.  The seventh digit from the right end will tell you how deep your crash is stored in the file system.  If it&#8217;s &#8216;0&#8242; then you&#8217;re stored four levels deep.  Any other digit is to be taken literally: &#8216;2&#8242; – two levels, &#8216;3&#8242; – three levels.  This  crazy scheme lets the depth be switchable at run time.  If directories are getting too crowded, we can raise the depth.  Of it we start getting running out of inodes, we can lower the depth.</p>
<p>Great thanks to Frank Griswold for the coding and to Aravind for not throwing knives at me.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/webdev/2008/12/01/three-weeks-with-the-new-socorro-file-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Socorro Project Update</title>
		<link>http://blog.mozilla.com/webdev/2008/11/20/socorro-project-update/</link>
		<comments>http://blog.mozilla.com/webdev/2008/11/20/socorro-project-update/#comments</comments>
		<pubDate>Fri, 21 Nov 2008 05:33:59 +0000</pubDate>
		<dc:creator>morgamic</dc:creator>
				<category><![CDATA[Socorro]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/webdev/?p=99</guid>
		<description><![CDATA[There has been a lot going on for the Mozilla crash reporting system.  Here are some quick links for project activity:

Closed bugs
Active bugs
Code check-in RSS
Socorro documentation

Upcoming changes:

Socorro site redesign &#8211; Neil and Austin are working with Sam and Chris to work through crash analysis to understand how the system is used to track down [...]]]></description>
			<content:encoded><![CDATA[<p>There has been a lot going on for the Mozilla crash reporting system.  Here are some quick links for project activity:</p>
<ul>
<li><a href="https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&#038;short_desc_type=allwordssubstr&#038;short_desc=&#038;product=Webtools&#038;component=Socorro&#038;long_desc_type=allwordssubstr&#038;long_desc=&#038;bug_file_loc_type=allwordssubstr&#038;bug_file_loc=&#038;status_whiteboard_type=allwordssubstr&#038;status_whiteboard=&#038;keywords_type=allwords&#038;keywords=&#038;bug_status=RESOLVED&#038;resolution=FIXED&#038;emailassigned_to1=1&#038;emailtype1=substring&#038;email1=&#038;emailassigned_to2=1&#038;emailreporter2=1&#038;emailqa_contact2=1&#038;emailtype2=substring&#038;email2=&#038;bugidtype=include&#038;bug_id=&#038;votes=&#038;chfieldfrom=2008-11-01&#038;chfieldto=Now&#038;chfieldvalue=&#038;cmdtype=doit&#038;order=Reuse+same+sort+as+last+time&#038;known_name=Socorro-0.6&#038;query_based_on=Socorro-0.6&#038;field0-0-0=noop&#038;type0-0-0=noop&#038;value0-0-0=">Closed bugs</a></li>
<li><a href="https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&#038;short_desc_type=allwordssubstr&#038;short_desc=&#038;product=Webtools&#038;component=Socorro&#038;long_desc_type=allwordssubstr&#038;long_desc=&#038;bug_file_loc_type=allwordssubstr&#038;bug_file_loc=&#038;status_whiteboard_type=allwordssubstr&#038;status_whiteboard=&#038;keywords_type=allwords&#038;keywords=&#038;resolution=---&#038;emailassigned_to1=1&#038;emailtype1=substring&#038;email1=&#038;emailassigned_to2=1&#038;emailreporter2=1&#038;emailqa_contact2=1&#038;emailtype2=substring&#038;email2=&#038;bugidtype=include&#038;bug_id=&#038;votes=&#038;chfieldfrom=2008-11-10&#038;chfieldto=Now&#038;chfieldvalue=&#038;cmdtype=doit&#038;order=Reuse+same+sort+as+last+time&#038;known_name=Socorro-0.6&#038;query_based_on=Socorro-0.6&#038;field0-0-0=noop&#038;type0-0-0=noop&#038;value0-0-0=">Active bugs</a></li>
<li><a href="http://code.google.com/p/socorro/source/list">Code check-in RSS</a></li>
<li><a href="http://code.google.com/p/socorro/w/list">Socorro documentation</a></li>
</ul>
<p>Upcoming changes:</p>
<ul>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=411428">Socorro site redesign</a> &#8211; Neil and Austin are working with Sam and Chris to work through crash analysis to understand how the system is used to track down the cause of certain crashes.  Structure, navigation and layout will be redone to best accommodate common uses.</li>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=432450">Socorro database repartitioning and migration</a> &#8211; Lars is working hard on refactoring our partitioning to better support the kinds of queries we make.  The end result is better long-term maintainability and improved performance.</li>
<li>New reports &#8211; <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=411424">MTBF reports</a>, <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=411358">URL reports</a> &#8211; Austin is working on two new report types to provide better information for crash analysis.</li>
</ul>
<p>Current issues:</p>
<ul>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=444749">Query timeouts</a> &#8211; this usually happens when a user does a complex query (either no product selected or a search over a period longer than 4 weeks).</li>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=466103">There is no dialog saying the app is working</a> and connections time-out leading a user to simply hit refresh to try to get where they want to go.  However, many of the queries taxing the system were non-specific queries over 3-month periods.  In some cases these jobs would timeout in the reporter but continue for 30 minutes or more in the database.  To prevent this from happening we pushed a <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=466059">revision to the search form</a> to limit query breadth.</li>
<li>Search engines were crawling Socorro so we <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=465403">added a robots.txt file</a> to prevent this from happening</li>
</ul>
<p>Learn more:</p>
<ul>
<li>Join us in #breakpad @ irc.mozilla.org</li>
<li><a href="http://code.google.com/p/socorro/source/checkout">Checkout the code</a> and read the <a href="http://code.google.com/p/socorro/w/list">documentation</a> to setup, install and hack on Socorro.</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/webdev/2008/11/20/socorro-project-update/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Crash Stats updated with Flot</title>
		<link>http://blog.mozilla.com/webdev/2008/10/27/crash-stats-updated-with-flot/</link>
		<comments>http://blog.mozilla.com/webdev/2008/10/27/crash-stats-updated-with-flot/#comments</comments>
		<pubDate>Mon, 27 Oct 2008 16:03:42 +0000</pubDate>
		<dc:creator>ozten</dc:creator>
				<category><![CDATA[Socorro]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/webdev/?p=86</guid>
		<description><![CDATA[As we headed closer and closer to Día de Muertos, do you ask &#8220;where can I mingle with the dead (processes)&#8221;? Why at Mozilla&#8217;s crash reporter tool, you&#8217;ll find Firefoxen, SeaMonkeys, and Thunderbirds who have crossed over and yet still linger&#8230;
Last week we updated the plotting library we use for graphs to use Flot and [...]]]></description>
			<content:encoded><![CDATA[<p>As we headed closer and closer to Día de Muertos, do you ask &#8220;where can I mingle with the dead (processes)&#8221;? Why at Mozilla&#8217;s crash reporter tool, you&#8217;ll find <a href="http://crash-stats.mozilla.com/?do_query=1&amp;version=Firefox%3A3.0.3&amp;query_search=signature&amp;query_type=contains&amp;query=&amp;date=&amp;rangevalue=1&amp;range_unit=weeks">Firefoxen</a>, <a href="http://crash-stats.mozilla.com/?do_query=1&amp;version=SeaMonkey%3A2.0a1&amp;query_search=signature&amp;query_type=contains&amp;query=&amp;date=&amp;rangevalue=1&amp;range_unit=weeks">SeaMonkeys</a>, and <a href="http://crash-stats.mozilla.com/?do_query=1&amp;version=Thunderbird%3A3.0b1pre&amp;query_search=signature&amp;query_type=contains&amp;query=&amp;date=&amp;rangevalue=1&amp;range_unit=weeks">Thunderbirds</a> who have crossed over and yet still linger&#8230;</p>
<p>Last week we updated the plotting library we use for graphs to use <a href="http://code.google.com/p/flot/">Flot</a> and we fixed a handful of <a href="http://code.google.com/p/socorro/">Socorro</a> webapp bugs. Flot is a good choice where we have simple graphs to display. Below are two graphs from the crash report listing page: An new crashes by OS bar chart for production builds, and an updated crash by build, for development builds.</p>
<p><img class="alignleft size-full wp-image-93" title="Crashes By OS" src="http://blog.mozilla.com/webdev/files/2008/10/prodcrashgraph.png" alt="" width="127" height="154" /><img class="size-medium wp-image-90 alignright" title="dev build crash frequency graph" src="http://blog.mozilla.com/webdev/files/2008/10/devcrashgraph.png" alt="" width="163" height="155" /><br />
<br style="clear: both" /><br />
Also using Flot, we&#8217;ve revamped the server status page, so that we can see the health of the Socorro system. Socorro is one half to it&#8217;s sibling <a title="Breakpad crash reporting client library" href="https://wiki.mozilla.org/Breakpad">Breakpad</a> which, together, provides an Open Source client and server crash reporting system. Let&#8217;s see if crash-stats is happy&#8230;</p>
<p style="text-align: center;"><a href="http://crash-stats.mozilla.com/status"><img class="size-medium wp-image-91 aligncenter" title="crash-stats server status screenshot" src="http://blog.mozilla.com/webdev/files/2008/10/server-status_1225120613147-300x253.png" alt="" width="300" height="253" /></a></p>
<p>This graph is powered by a new cron job which checks the server status based on several statistics related to processing crash reports. Another cron job we&#8217;ve worked on is topcrashers.</p>
<p>In coming weeks we&#8217;ll be bring new life to the crash reporter web app and improving this <a href="http://blog.mozilla.com/webdev/2008/04/21/crash-analysis-now-in-open-source-flavor/">unique and important</a> part of how the Mozilla dev and qa teams work. We will be focusing on making it easier to use by providing new reports, easier navigation, and better visibility into crashes. What do you want to see in <strong>your</strong> crash reporting tool?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/webdev/2008/10/27/crash-stats-updated-with-flot/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
