AMO Site Updated, Rolled Back

AMO was updated on Thursday, March 22nd around 8pm. Overnight, we watched the web infrastructure to ensure that AMO could withstand peak load times, but this morning near peak time cluster load levels became too high and we were forced to rollback yet again to prevent affecting other critical applications.

Good news:

  • Our database bottleneck is now nonexistent
  • The app servers are fine during off-peak times
  • During this short period, we received as much if not more feedback than during previously announced beta window (2 weeks)

Bad news:

  • App server load is unacceptable
  • Traffic on web nodes has more than doubled as a result of absorbing releases.mozilla.org traffic

Strategy:

  • Profile application to look for new pain points (already done, no obvious culprits)
  • Move public add-ons .xpi traffic back over to releases.mozilla.org
  • Only sandbox files will be served from webheads for policy reasons
  • Remove locale strings from all image URLs to improve cache rate for images
  • Reassess and redeploy at earliest reasonable time

Because life isn’t complete without colorful graphs, here is a graph showing database CPU (load reflected this too) going down dramatically:remora db cpu usage
But the bad news is that our app nodes were angry:

mrapp07 load graph

This is largely due to a dramatic increase in overall traffic that was moved onto the cluster from releases.mozilla.org:

remora traffic

This traffic graph is what leads us to believe that offloading file transfers back onto releases.mozilla.org will give apache the breathing room it needs. We will post an update as soon as we can verify that.

Thanks again everyone for supporting us as we work through these issues. As for the negative comments, they serve as good motivation, too.

Categories: AMO

11 responses

  1. The HavoX wrote on :

    Ah… I see. It all makes sense now.

    I was wondering what in god’s name happened to the AMO site.

  2. docwhat wrote on :

    And the evilness that is the sandbox? Will that be fixed as well?

  3. morgamic wrote on :

    Mike Shaver will be posting a separate blog about the sandbox to help clear things up. We didn’t want to mix the two posts up.

  4. docwhat wrote on :

    I’m waiting with bells on….

  5. max1million wrote on :

    Sandboxed extensions not likely to go public.

    Now instead of having perfectly good or new extensions (even already released) at the end of the most popular list cause of fewer downloads and fewer updates to cause automatic downloads, they are hidden away in an if you find this section don’t bother cause they haven’t been tested or just aren’t any good section.

    If you rely on average users to be able to even find them, not to mention take time to, test, rate and nominate for public, before actually reviewing and adding to regular public section, you might as well forget them.

    That will only reinforce the existing and most popular extensions, not serve to promote others. I thought promote other less popular extensions was one of the goals.

  6. Meatball wrote on :

    It would also be nice if Mike Shaver could blog about the process by which extensions are “chosen” for recommendation. Assuming of course there is a process.

  7. Geva wrote on :

    At least let us use the old CPanel in the meantime… You shouldn’t locked it up while you fix things up… first fix it and then do the migration stuff.

    And also, please don’t sandbox all the extensions while you review them one by one — you should first review them, and only then make the results of the reviews into effect.

  8. Bogdan wrote on :

    Oh, the sweetness of bitching!

    I’m an extension developer caught pants down by this site update (just so you know where I’m coming from: I haven’t followed the site update process, and released a new version of my extension early during the site freeze — therefore I’m most affected by all subsequent delays). And yes, there /are/ things I’m not happy with in the new incarnation as well.

    But *come* *on*! There really is waaay too much whining in here — guys, get a grip! Do you really, truly, honestly think you would’ve managed this significantly better with the same resources? Or do you think the developers here are squarely ill-willed? Or at least mildly not interested in AMO? Come now.

    Instead of whining, let’s give the AMO team a nice, good round of applause for their efforts, let’s hope for the best, and most importantly, let’s allow them the peace of mind to fix all current issues: unless you’re wildly pessimistic, you’d agree they’re all doing their best to make it thorough a tough site update — and upgrade.

  9. Kiroset wrote on :

    Thanks for the update, good luck

  10. MASA wrote on :

    @max1million

    Agreed.

    @Mozilla

    I bet you 5 bucks you won’t make the next deadline if it’s in March of 2007.

  11. Anders wrote on :

    It seems strange to me that localization would have any performance impact. Since the site is mostly a static site, I would expect most of the pages to be baked as static html-files (and pre-compressed) in the file system, leaving only the search (if not outsourced) and the review submission as dynamic. Have you instead chosen to generate all pages dynamically and, if so, why?