AMO 3.2.1

April 21st, 2008 by morgamic

addons.mozilla.org was updated last week. AMO 3.2.1 was a maintenance release (26 bugs fixed) for any major issues with 3.2.

Our next release will be AMO 3.4.1, the first of three dot releases for AMO 3.4, which is our next milestone to be completed before Firefox 3.

Socorro Updates

April 4th, 2008 by morgamic

We’ve pushed some important updates in the last couple of days:

  • refactor of processor code, which is 1/3 of the breakpad server
    architecture
  • update of reporter to allow for instant queuing of requested reports

This means:

  • If you submit a crash, going to that crash page will:
    • Show you a “haven’t queued it yet” page instead of a 404
      page that will update in < 10 min
    • Once queued, you’ll see a “report pending” page that will
      redirect to the finished report in < 21 seconds
  • Wait time for reports from testers is reduced to 10 min max,
    sometimes 21 seconds best-case
  • We are working on eliminating the 10 min portion but there are
    reasons why we can’t spam the monitor that is responsible for
    queuing new reports that are on disk — more on that next week (I
    want this to get down to: load, wait 20 seconds, BAM! see your report)

Thanks for everyone’s patience with the crash report backlog during releases — we hope this helps many of you.

Let me know if you have any questions.  More to come in the next few weeks!  Thanks to Lars, Ted and Aravind for their help with developing/ testing and pushing these updates.

AMO 3.2 SVN Stats

March 27th, 2008 by morgamic

Ran statsvn on r7797 through r11622.  View svn stats for AMO 3.2.

All week, all the time:

Activity per day of the week

On the developers page, you will see how many non-mozilla contributors have helped us update translations.  Overall, some rough stats:

  • 3134 changes
  • 288602 lines affected
  • 24 contributors

Again, thanks a ton for your help!

Bringing Sexy Back to AMO

March 26th, 2008 by morgamic

Yep — “sexy” and “AMO” in the same sentence.  Our festively plump 3.2 target milestone added many features and refreshes that you can read about in a great post on Basil’s Bodacious blog.

In the same time period we’ve also:

  • Upgraded CakePHP’s core software from 1.18.x to 1.19.x
  • Added support for weighted database slaves
  • Migrated AMO to PHP5 off of the now end-of-life’d PHP4
  • Replaced Scriptaculous/Prototype with jQuery
  • Improved IE6/IE7 compatibility
  • Improved accessibility features

And really, when it comes to sexy, the real magic starts with our volunteers.  Our editors have worked hard to review new and updated add-ons as we move towards Firefox 3 this year and our localizers translated roughly 200 new strings in AMO templates in a little over three weeks for 24 locales (wow).

Thanks to everyone for pitching in to make this release happen.

Hungry Hungry Add-ons Manager

February 15th, 2008 by morgamic

For about 48 hours the AMO API was thrashing because of the popularity (and hunger!) of the new add-ons manager in Firefox 3 Beta 3.  The dust has settled and the servers are humming happily along, so now is a good time to blog about what happened and how we’ll handle future releases successfully.

Stop.  Take a deep breath.  Alright, here we go.

What happened this week?

Now that the API is functional (most major bugs have been ironed out) we got a rude awakening this week and found out exactly how much traffic the improved Add-ons Manager can generate, but it’s a nice problem to have and we’re happy it’s been well received.

Wednesday, around peak time, the API started clobbering our databases:

db03 load average

Shortly after we entered our peak traffic window, we had to turn off the API to keep the normal AMO working.  Diagnosis found that:

  • Load was not utilizing the read-only slave and was focused mainly on the master read/write database (mrdb03).
  • Cache hit rates were down to 60% from the usual 90% for memcached
  • When our databases hit peak CPU, the app cluster would tumble because of the piling requests

How it was fixed?

Wednesday, IT and Webdev spent quite a bit of time getting the API back up.  Starting with the three points above, we:

  • Off-loaded read-only traffic to DB slaves
  • Investigated optimizations for the API
  • Looked at cache rules and cache policies for both memcache and the hardware load balancer

However, Thursday didn’t fare any better for the cluster.  This time the slaves started to melt near peak time — forcing us to once again temporarily disable the API.  Under-utilizing memcache was the main issue.  Cache headers were fine, slave was utilized, app nodes were fine — just too many damn queries flying at our database servers! :)

Load got high, but we disabled the API before it became critical

So on Thursday we continued our look into what was going on.  We tried to figure out why our cache hit rate was so low (60% instead of 90%).  Digging through AMO, we found CACHE_PAGES_FOR, which set the expire time on memcache records when calling Memcache::set(), was set to 60 seconds.  We increased this to 7200 to aggressively cache database traffic and were collectively off for valentine’s dinner.

The next day, Memcache was our valentine.

mrdb03 survived Friday without a blip

db04 load was higher than the read/write master
The combination of our efforts worked:

  • Overall query traffic was reduced dramatically
  • What traffic that did make it past memcache was well distributed onto 2 read-only slaves (db04, db04-2)
  • App code was optimized to reduce overhead and unnecessary database traffic — this was done by placing hard limit on the number of search results returned by the API, among other things

How will we scale?

So these growing pains will help us move forward.  Here is our plan of attack for scaling this beast for the Firefox 3 onslaught:

  • Move the API (services.addons.mozilla.org) to a separate docroot with its own read-only slaves and more aggressive caching policies that are separate from the main AMO
  • Optimize client code to reduce the number of requests needed to retrieve data and also imploring local caching methods for redundant content or content that doesn’t change over time very much
  • Offload even more traffic onto read-only slaves
  • Upgrade to CakePHP to latest 1.1.x stable branch, which optimizes auto-generated queries quite a bit (thanks to clouserw for researching this)
  • Refactor how we pull localized strings from our database
  • Optimize our search performance on AMO and the API
  • Switch default CakePHP data source to read-only slaves
  • Find ways to use memcache at higher levels (caching larger objects instead of at just query level)

Once again it was a great team effort to get things running smoothly.  Thanks to IT for helping us troubleshoot this.  We’ll continue to build on this experience to ensure better reliability in future releases.

Looking back at the last three days, the Firefox 3 Beta 3 release was a success in more ways than one.  It showed everyone what the web can do, but it also helped us wrap our heads around the API and how much traffic it generates.  All of this will make for a better Firefox 3.0 release.

AMO 3.2 Preview

February 15th, 2008 by morgamic

AMO has a new look and we need your help to polish it off. Please tell us what you think!

Here are some screenshots:

Rec vs. ExpExperimental close-upDev StatsFeatured add-onReviewsApp ChooserDeveloper CP Nav

Aside from a new look, here are few highlights in AMO 3.2:

What we’d like to know:

  • Does the reskin help you find what you need quicker?
  • Does the absence of “types” confuse things? (plugins, search plugins, themes, extensions)
  • What should we do to make things better/easier to use for you?

Keep in mind that we are still ironing out some wrinkles. For more information:

Thanks, and looking forward to hearing from everyone.

AMO Update r10238

February 11th, 2008 by morgamic

Yes, we are over 10,000 commits in our subversion repository. This last update for AMO trunk includes the following fixes, among others:

  • Update sk locale from bug 367271
  • Improving install experience for non-browser apps (bug 401272, r=clouserw)
  • adding GUID to categories RSS feed to enable feed readers to distinguish fresh items from old ones (bug 411834)
  • merging new strings from Thunderbird install experience fix (r9576, bug 401272) into all other locales
  • fixing “all versions” RSS feed, bug 392183
  • Fix bug 394590
  • Fix bug 378782
  • Total download counting in maintenance script; bug 409341; r=morgamic
  • Adding pt_PT locale from bug 391197
  • Update pt-BR locale from bug 380221
  • Update zh-CN locale from bug 407472
  • fixing data sanitization for UTF-8 characters: bug 412580, r=laura
  • Adding support for application wildcards in categories; bug 408525; r=clouserw
  • adding test for UTF-8 sanitization (bug 412580)
  • fixing pagination sanitization, bug 412580, r=fligtar
  • Checking in reviewcount column and maint script from bug 408680.
  • Firefox 3 additem notices; bug 406898; r=morgamic
  • fix bug 415085
  • Fixing sanitization of discussion dates on addons detail page (bug 414541)
  • Unflag sr-flagged add-on; bug 371214; r=fwenzel
  • minor change to bin database class; bug 409341; r=morgamic
  • Checking in review count column stuff from 408680. r=fwenzel.
  • fixing memcaching for select queries that start with whitespace (bug 416403, r=morgamic)
  • Update fr locale from bug 366239

I want to thank everyone on the AMO team for their hard work, especially localizers who have worked really hard to port AMO to their native language. 2008 is already turning out to be a great year — let’s keep it up!

Second thoughts on dynamic content

December 20th, 2007 by Wil Clouser

I was looking at one of the AMO v3.2 mockups today. There are strings like “See All Interface Tweaks Add-ons” that we’ve avoided up till now, but this isn’t the first time they’ve been proposed. The problem we’re having is that a string like that is from two different sources - static and dynamic data. “Interface Tweaks” is the name of one of our categories so it’s stored in the database, and the rest of the string is static, so it’s in a .po. The static string would look something like:

See All %s Add-ons

and the dynamic string would look like:

Interface Tweaks

In English, these combine and all is well, but if the second value affects the structure of the first in a dynamic way, we can’t support the phrase on AMO.

When I wrote the original code, I think I had two things in mind:

  • Categories would be changing more often
  • Localizers wouldn’t have direct access to SVN

Categories have changed a bit in the past (and they’re in mid change right now, actually), but other than the convenience of a near-instant change on the site, it doesn’t seem that beneficial to change them via the web. The second point is a big one though - by giving localizers direct access to SVN, they can update strings whenever they need to without our meddling and getting in the way. That’s a big time saver for everyone.

So, now I’m reconsidering the separation of some of the interface translations (add-on types, applications, and categories) from the rest of the static content. Looking at where we currently stand, it seems like it’s more of a hassle to describe the separation and what it does than it would be to just drop everything in the .po file. Plus we’d get the benefit of strings like “See All Interface Tweaks Add-ons.”

AMO: Developer Replies to Reviews

September 21st, 2007 by Frédéric Wenzel

Recently, I worked on a nice little feature for AMO: Letting developers reply to user reviews.

The idea is, when you get a review as an add-on publisher, you may find a spot or two in it that you feel like replying to. In the previous version of AMO, users started discussions by just adding another review (with a random rating), effectively rendering parts of the rating system useless. Also, the developers were not allowed to rate their own add-ons and could thus not reply to any of the questions.

In AMO version 3, we addressed this issue by having our editor team moderate reviews to ensure “good” reviews and thus useful ratings. For discussions around individual add-ons, we introduced a forum system to provide a more appropriate means of discussing support questions and similar while not diluting the review/rating system. This, however, left reviews as a one-way communication with no way for developers to address questions raised in their add-ons’ reviews.

This issue is now fixed: As of today’s AMO update, developers can now reply once to each of the published reviews of their add-ons.

AMO review reply, example

The reason why we only allow one reply is so that discussions are held in the discussion forums (which is where they should be, as mentioned above), while not forcing developers to keep possible allegations undisputed. Some of you may notice parallels to the rating system Ebay uses for sellers.

Before you start replying to a bunch of reviews now: Remember, the reviews of your add-on are a place for your users to give feedback and point out good or bad things (in their opinion) about your add-on. Developer replies will also be moderated by editors, just like regular reviews. So, as always, play nice, even if you disagree with the opinion explained in the user review. If you get a bad review once, don’t fret: Rather than “yelling” at the user (which will most likely be deleted by an editor anyway), you should take it as an encouragement to make your add-on better. After all, the user considered your add-on interesting enough to download and test it and give you feedback on the points they found improvable. This is how great add-ons start!

We hope you enjoy the new feature! If you find any problems with it, feel free to drop by #amo on irc.mozilla.org or to file a bug as usual.

Bouncer Updates

August 16th, 2007 by morgamic

Bouncer had a few updates last night:

  • Database library rewritten to be lightweight and use memcache (see tests)
  • Overall requests per second performance increased 2x
  • Database usage down to about zero because of memcache

To see the difference, I’ll show a couple of database graphs. Note that logging is still turned on, so there are still database connections used for updating product and mirror counts. We plan on disabling this in the future because these stats are backfilled from HTTP logs.

Pretty CPU graph:

Database server CPU usage

Pretty traffic graph:

So CPU and traffic data was reduced significantly. Here’s why:

  • Bouncer is a special type of app - high amount of reads over a short period of time
  • The majority of these database reads are repeated over and over, especially recent releases (Firefox 2.0.0.6 for example)

Again, database traffic flatlined during testing when logging was disabled so what you’re seeing is basically download count updates and the connections needed to run them.

To get an idea of exactly how much memcache is used, here are stats from memcache as of this morning (single server, after approximately 12 hours of usage):

  • Gets: 29884170
  • Misses: 199515
  • Total Gets: 30083685
  • Hit %: 99.33679999641

This means that of all queries, 99.33% were read from the cache. Not bad.

This patch took two weeks longer than expected. Challenges we faced:

  • Getting random mirror selection to work when not using “ORDER BY RAND()” as the way to randomize mirror selection — this was replaced by a simple hash and array sort algorithm that uses array values to weight overall probability of selection when combined with a random seed. In general, random selection via SQL is inefficient and you’re better off using another method besides just RAND() — stored procedures, app code, whatever — but when you’re doing “ORDER BY RAND()” you’re in for some pain. Use EXPLAIN — it does not lie.
  • Being stingy with MySQL connections. The database library rewrite was required because we did not want to even connect to the database when memcache had all of our results cached. However, mysql_real_escape_string() was used to escape queries and it requires an open database handle to work. That meant that we had to only escape if we knew we were going to perform a query. So we had to move cleaning of SQL inputs to inside the query callback function. This was done by…
  • Breaking away from the PHP4 mysql habit. PHP 5.x has better support for this via mysqli but since we’re still on PHP4 for a little while, in order to mimic prepared statements without using a database interface we used a method much like what was discussed among Wordpress developers. It takes us from a connect() ->clean -> concatenate sql -> query() approach to a prepared sql and args -> query() -> autoclean approach.

Overall, this is another positive experience with memcache as a query cache. We use the same method in AMO and plan on using it in other apps that will have to scale to any reasonable level. It saves money in hardware and turns little apps that can sort-of do the job into supercharged workhorses. Yay.