Sep 17

Cleaning up the Graph Server Database

Category: MySQL

The Graph Server, your one stop shop for all things performance, has gotten a little unwieldy around the database.  The underlying tables have recently passed a milestone - a billion rows.  The raw disk footprint is 108GB as of today (and it’s MyISAM even), making it - by an order of magnitude - the largest database in the Mozilla MySQL infrastructure.

This database currently stores two types of data - the raw per-run data and the aggregate values.  For example, when one of the test machines runs through a test, it might run the same test 20 times in a row.  Each run is timed and recorded and stored in the database.  An aggregate value is then calculated and that is also recorded.  We generally use the aggregate data to graph things over time to answer questions or see how certain code changes have impacted the speed/memory usage/etc of Firefox.

Sometimes it is useful to look at individual runs, but by and large the data that is really useful is the aggregates.  As such, we’ve decided to purge older per-run data.  Starting tomorrow, we will begin to remove individual run data points that are older than 60 days.  This allows us to keep the in-depth information from the past two months if we need to look into how an aggregate data point was calculated, but for things older than that, we will need to rely on the aggregate value alone.

And of course, we have backups of the data being removed.  If anything goes awry or if we or someone determines that we need this data to be available, we can restore it.

If you have any particular use case or something that we should know about before removing this data, please let me or a member of the Release Engineering team know.  Thank you!

1 comment

Jun 4

Databases and Web Applications

Category: MySQL

The Internet is full of stuff.  When you get right down to it, one of the amazing things about the low cost of entry to writing software is that we can end up with such a diverse ecosystem.  If you’ve ever tried to pick software to power a wiki, blog, forum, webmail, or any of the zillions of other things you can do on the web, you will quickly realize that there are so, so many options.  Frankly it’s overwhelming.

Mozilla is no exception to the rule, either.  With the number of sites that we run, we’ve ended up using a rather large number of different applications.  Two or three wiki platforms, WordPress, Drupal, and many custom applications that have been written over the years.  This is all well and good, but we run into a real issue when it comes to scaling these applications at the database layer.  Almost every application that we run suffers from the same problem: it assumes that one database will handle all of its needs.

In the real world that we live in, especially at the scale of Mozilla, that’s sometimes not the case.  Whether it’s because the databases don’t have enough RAM or fast enough disks, or because there are too many applications on one machine, the sad truth is that capacity limits will be hit at some point.  Sure, you can generally buy a new machine with faster disks and more RAM, but after doing that a few times, what do you do with the older hardware?  What about when there are no faster disks, no faster processors?  (This is the age old question about scaling up versus scaling out, yes, yes it is.)

Let us first have a little bit of discussion to establish some basic database concepts.  These might not be very useful for some of you who are familiar with the subject matter, but some of you may not be used to think about how you’re actually using them.  Humor me for a few minutes while I build a common ground for us to work from here.  (Or skip ahead!  Your call.)

In general, you can break up database access into two activities: getting data out (SELECT) and putting data in (INSERT, UPDATE, DELETE, REPLACE, etc).  Or, in more common terms: reading and writing.  After all, that’s what a database does, right?  It stores data.  Someone puts the data in, someone pulls the data out.  That’s about as simple as it gets, really.

(The astute reader will probably note that I’m avoiding more complex topics like transactions, replication lag, and other things as I meander through the rest of this post.  I may get into those in later articles, but for now, I’m not too concerned about that in the examples we’re discussing.)

Now let me describe the typical database setup here at Mozilla.  We have four or five database clusters.  (Note that the term ‘cluster’ simply means a logical collection of database machines, we’re not actually talking about MySQL Cluster.)  In most of these clusters there are two machines: the master and the slave.  The slave is a read-only mirror of the master, but is constantly replicating changes from the master, ensuring that the copy of the data on the slave is in sync with the master at all times.  This is pretty straightforward when you think about it.  All of that “putting data in” (write traffic) must happen on the master, but you can get data out of either machine.  (They have the same data!  You can read from either one!)

It doesn’t always have to be setup like this, however: it’s easy to have more than one slave!  In fact, you can have as many slaves as you want and still only have one master.  If you think about it, this lets you scale out - but only on your read traffic!  Since you have more slaves, you can handle more of the “getting data out” part of things, but you don’t get any extra ability to put data in since everybody still has to put all of the data in.  (No, I’m not going to discuss scaling the write traffic right now.)

If you are having some trouble picturing this or aren’t entirely sure what it looks like, here’s a bit of a thought experiment.  Let’s pretend the database is like a book.  (Which it is, if you think about it.  Books contain information, so do databases.  Books have an index in the back, databases have indexes.)

The first situation you have is that of the “only one database” setup.  There, you have one book in your hands.  Now let’s say your friends want to read from the book.  Obviously only so many people can be crowded around the book at the same time before it gets too crowded.  You don’t have to worry about who’s writing to it though, because you can always see the current/correct/most up to date data.  Everyone is on the same book!

Now let’s say you have too many people who want to see what’s in the book.  What do you do?  Either you tell everybody to take turns (some applications can do this - batch style applications - but it’s not useful for web/interactive applications so we can’t use this answer) or you can make a copy of the book!  Now you have two books.  Well, the second book is a copy of the first book at a certain point in time, so you must have somebody copying all of the changes made in the first book over to the second book.  But now you can have twice as many people reading the books, because you have twice as many books!

If you add a third book, the capacity grows as you’d expect.  50% more room for eyeballs than two books.  The fourth book nets you 33% more read capacity than just three books.  But note that you never get more write capacity!  Every time someone changes the “master” book, that change must still be written to all of the “slave” books.  (Again, not going into write scaling here…)

Anyway, let’s move on.  Typically, web applications are very read heavy.  Blogs, wikis, forums, and just about every other application that we run is skewed towards pulling data out of a database and presenting it for the users to view.  Lots of eyeballs, so we need lots of books!  This is also the pattern of use that works really well in the master-slave environment we just talked about.  Since there are so many reads going on, if you’re running out of capacity on your cluster, you can just add another book/slave.

This is how the Addons site (for example) works.  It sends all of its read requests (lots of them) randomly to one of the database slaves allocated for it.  And since we expect the traffic for this application to go up with Firefox 3, we can just add another database.  (I’m actually doing that this week, pre-emptively building a new database for it!)  This allows us here in IT to scale the application without involving the web development team, which is a win for everybody.  Hurrah!

But now, to bring this back to the main reason I wrote this post: most of the software we use - the blogs, wikis, etc - don’t know how to use slaves.  They only use a single database, which leaves us with little ability to scale the applications.  What if the Spread Firefox site suddenly tripled in traffic?  Well, the existing database could probably handle it, sure.  What about an order of magnitude increase?  We’d have to move it off of the database it’s on and find some beefier digs!  How about another order of magnitude?  I don’t think we could take it, because we’re going to be limited to one database no matter how hard we try.

(The diligent reader might note that I am being somewhat loose with details… there are other ways [such as caching] to scale, and we do use them.  If SFX did end up growing like that, we’d find a way to deal with it!  But that’s not the point of this particular exercise.  I want more options!)

Anyway.  For the web developers out there, this is something to consider as you write your applications.  Make your database layer scale!  You might think this is a hard thing to program for, but it’s really not!  You simply need to be able to get two database handles: one to the master, one to (one of) the slaves.  Then you decide which one to use in each situation that you come up against and use it!

* Rendering out a page full of comments for someone to view?  Use the slave!  You’re not writing anything, so this is easily done.

* Letting somebody post to their blog?  Use the master!  You need to send all of that write traffic on up.

* Doing a full text search or using INSTR on a table to try to find something?  Use those slaves!  That’s what they’re there for!

* Updating a hit count?  Write that stuff to the master!

I think by this point you’re getting the idea.  I really hope this blog post wasn’t too basic and might be useful to somebody out there.  Yes, there are definitely some areas I’ve glossed over.  But hey, it feels like enough rambling for today.  :-)

5 comments

Feb 25

MySQL Proxy: so promising, but not quite there?

Category: MySQL

I will admit in advance, this post is based on only a few days of working with MySQL Proxy.  I’ve emailed the author a couple of days ago, but not heard back yet, and I’m stumped as to why this isn’t working as advertised.  Theoretically it supports read/write splitting of queries to different backends, but when put to the test it seems that there are problems with connection pooling.  I.e., the second it tries to reuse a connection, that connection terminates and the client gets an error.

The concept of a MySQL Proxy is quite interesting, however, especially from the perspective of being in the IT team and having the quickest way to solving problems sometimes not involve fixing the app.  Let’s talk about a particular issue, the one thing I am presently working on solving: put a Bugzilla instance in Europe to provide a much better experience for people who need to use Bugzilla and are thousands of miles from the main instance in California.

The first step was to setup a mini-cluster in our datacenter in Amsterdam.  Matthew did most of this work, and then handed the project off to me after the machines were generally setup.  I took over and did some work to get the machines into monitoring and our configuration management system.  The basic setup is that we have a database slave running MySQL and replicating from the main database in the US.  A few webservers are then setup with this database as the Bugzilla “shadow” database, i.e., a read-only slave.

The problem is that Bugzilla doesn’t really use the shadow role very often.  Most of the queries, even reads, are sent over to the master database in the US.  This causes really problematic latency issues as it sends a dozen queries to build a page.  If each one takes ~170ms one-way to pass data, then we easily see page load times of 5-10 seconds or more depending on how many queries are done to generate the page.  This makes the site nigh unusable.

The typical method of fixing this involves changing the application to support read-only slaves.  While that’s certainly an option and one that should definitely be done, for various reasons it’s not the most attractive option in this case.  Enter MySQL Proxy.  With the promise of read/write splitting, it seems like it would be a great drop-in solution for solving this problem.

Theoretically, MySQL Proxy is setup on the database slave in the Netherlands.  It’s configured to point to the US master as the writable database and the local slave as the read-only database.  The application (Bugzilla-EU in this case) is then configured to use the Proxy address as the database.  In theory, with this done, we then will solve the latency problem by sending read queries locally and write queries across the pond.  Hurrah!

The problem comes down to this: the software doesn’t work (version 0.6.1) in this use case.  I spent some time trying to debug it, going so far as to step through the core C code to try to determine how it does what it does.  I hit a roadblock of time to invest, unfortunately, and decided that I couldn’t justify spending another day monkeying with the bits to find the problem.  I’ve posted on the MySQL Support Forum for MySQL Proxy and hope to get a response at some point, as this would be great software to use for this problem.

Well, fingers crossed!

4 comments

Feb 8

Release parties and database load.

Category: MySQL

Dealing with a setup where most things are based around huge spikes is interesting.  It’s different from my previous work at Six Apart where generally we had daily peaks and troughs but there was rarely a case where we could say “this is going to cause an order of magnitude increase in traffic overnight.”

Here at Mozilla, we do.  Firefox 2.0.0.12 was released to the world yesterday around 3PM, and over the next 18 hours the traffic started ramping up.  At this point, the database that does most of the queries for addons.mozilla.org (known as AMO) are pushing 3,000 database queries per second.  That’s a heckuva lot of data to be pushing around, and the database is definitely hitting some load limits.

At this point, the database is CPU limited in the processing of queries and there’s not much that can be done on the database itself.  The queries are all hitting indexes, the tables are not too huge, everything is cached in memory, etc.  There are some minor tweaks that can be done to the table cache (it was at 450 and we were slamming into it), but otherwise it’s just going about as fast as it can go.  The query cache is taking 80% of the traffic, too, which is nice.

Since the traffic is so read heavy, adding a second database would be a real win in this situation.  We would get nearly double the performance by having a second database and having the application randomly select one (with or without weighting).  Another approach would be to use a caching system like memcached to store the query results for some period of time.  There are advantages to both approaches.  Generally the first is easier to implement as the application already understands talking to databases, however the second will usually buy you more performance for less money.  Tradeoffs.

Still a lot of work to do, and today is busy.  I’m trying to get in the habit of updating, and missing a week would be most unfortunate, so you get this rather detail light post.  Enjoy.

No comments

Jan 30

Applied diagnostics, part 1.

Category: MySQL

In the first of what is sure to be many similar posts, today I’m going to write a little bit about an issue that was faced two weeks back.  For lack of any particular projects (my third day on the job, they were still letting me warm up and poke around free-form) I was getting a little intimate with the database that most of the queries for Bugzilla and Bonsai and various other projects run against.

A brief description of the setup: this database cluster is two databases, one Master and one Slave.  All write queries from the various sites go to the master, and all (well, many) read queries go to the slave.  It’s fairly straightforward.  For the past some weeks (I’m not entirely sure on the timeframe) the slave had been having random load spikes that was causing it to become unavailable and page the currently on call member of the team to deal with it.

This was quite annoying for whoever was on call, as it tended to happen most around 3-6AM.  For me, the first step in diagnosing any problem with a database is generally to get a good birds eye view of what is going on and try to understand where the problem is.  On a conceptual level, a database runs into load issues when it is maxing out on some particular resource that it needs to answer a query.  This may sound basic, but it’s the platform for some hardcore applied diagnostics!

There are many ways you can approach the problem of “my database is overloaded!”, but two big ones I can think of are analyzing the traffic and analyzing the system.  Both are useful, and you will probably want to do both to get a clear picture of what is going on.  You have to start somewhere though, and in this particular case, I started with analyzing the traffic itself.

First, I took a look to see what the machine was busy doing.  SHOW FULL PROCESSLIST run in MySQL will give you a list of all active connections and any currently running queries.  Note that a lot of the time, you may not see anything!  On most healthy servers, you will probably see none or only a few queries in the output.  In a server with problems, you might see a lot of queries in the output.  Or just a few queries that are taking a long time!  If you do see queries, look at them.  Understand what they’re doing.  Are you seeing the same query over and over?  Or is it a bunch of different queries?  What stare they in?  Is the time field climbing and climbing?

In this case, I saw one particular query that was being executed on the slave at the rate of several per second.  My first expectation was that this query was bad - perhaps it wasn’t hitting an index?  Here’s the query so you can follow along:

SELECT type, UNIX_TIMESTAMP(ci_when), people.who, repositories.repository, dirs.dir, files.file, revision, stickytag, branches.branch, addedlines, removedlines, descs.description
FROM checkins,people,repositories,dirs,files,branches,descs
WHERE people.id=whoid AND repositories.id=repositoryid AND dirs.id=dirid AND files.id=fileid AND branches.id=branchid AND descs.id=descid AND repositories.repository = ‘/l10n’ AND ci_when >= ‘2008/01/02 20:22:03′

(Apologies for this not being very readable… I can’t seem to figure out how to work a PRE tag in this blogging software…)

My gut instinct on seeing this query is: ew, lots of joins.  But overall, it’s not too scary.  I started analyzing the tables involved, taking full advantage of MySQL’s EXPLAIN command, and discovered that the query is about as optimal as it’s going to get.  Every join is on an index, and everything looked good.  With that rudimentary check done, and not seeing anything particularly wrong with the query, I ran it to look at the performance.

Woah, gravy!  The query takes some 6+ seconds to run!  That’s generally not a good thing, especially when you are seeing many of these queries being executed every second.  Debugging began in earnest at this point, since my gut instinct here was that something in the database wasn’t tuned.  I mean, a relatively simple query taking several seconds to execute, doesn’t sound right!  I next dove into the server itself, to determine what was going on.

Presumably there’s a bottleneck somewhere.  Something is making these queries take six seconds.  Well, as you might expect, your best friends for locating bottlenecks are going to be the tools you use every day.  A few moments watching top showed that the CPU was maxing out when these queries were hitting the server pretty hard.  Some time watching vmstat revealed almost no disk activity.  Well, that rules out I/O throughput issues.  At this point it’s fairly easy to decide to continue investigating into MySQL itself, the server doesn’t seem to be having grievous problems.

Diving into SHOW STATUS gave me two particular areas of question at first blush, but neither of which seemed likely to be the culprit.  The number of opened tables (Opened_tables) was quite high - in the millions - but wasn’t growing at a rate that was overly worrying.  Yes, the query in question has a lot of joins, but even maxing out on concurrent threads we were not hitting the table limit too hard.  I did raise the limit on open tables, but performance was negligibly impacted, if at all.

The second thing I noticed was that the query cache was not being used that well.  When the number of hits is approximately equal to the number of inserts, and the number of pruned items is similarly high, then it’s not doing you much good.  Of course, since the queries are all varying by a few seconds each time (look back at the ci_when comparison in the WHERE clause), there’s really not much gain in having the query cache enabled for this particular query.  I bumped the cache size a little to reduce the number of low memory prunes, but again performance was unchanged.

At this point, something else caught my eye - the number of Key_read_requests that MySQL was reporting.  It was huge!  Compared to the number of queries being run, the value was very, very high.  This is about when it dawned on me - this query really is scanning a lot of data!  I started investigating, and sure enough, every time this query runs it has to do tens to hundreds of thousands of key reads.  No amount of index caching is going to help this query, it just has so much data to dig through that it’s literally taking that long to push the bytes around.  A little crestfallen, I backtracked slightly to try to find another approach.

One thing in particular bothered me about this query.  The conditional on the ci_when column was exactly two weeks ago, and every time the query ran it was creeping forward by only a second or two each time.  If it was possible to make this query use more stable values (say, even times every 15 minutes or so) then we could take advantage of the query cache, and instantly improve server overall performance if not the performance of the individual queries.  Some asking around revealed that this query belonged to the Tinderbox application, and some more poking showed that nobody needed it to be exactly two weeks ago.

With that knowledge, and the help of Dave Miller, I submitted a one line patch to the application to change the times generated.  It was something like this:

$time -= $time % 3600;

Which basically flattens the time down onto even hours.  The SQL is then generated and fed to the database.  Once this went live, the load on the server plummeted - here’s the CPU graph:

DB CPU Graph

Sometimes the problem is the database, perhaps your disks are slow, you don’t have enough memory, the tables are not properly indexed for the queries you’re doing, etc.  Sometimes, however, the problems are at another end, and a simple change to the application can negate an incredible amount of load by making the two play together better.

At any rate, that’s a lot of words to say: “this query wasn’t taking advantage of the query cache, so I fixed it.”  I aim these entries towards the new diagnostician, someone who hasn’t gone through this, and is trying to learn about how to approach a problem and not just how to solve one particular problem.  The process is more interesting than the solution!  Enjoy.

No comments

Jan 22

An introduction, of sorts.

Category: Personal

I tend towards the verbose, and if you stick around to this blog (and I manage to write in it) then you will probably become familiar with this.  (This is the end of the only nod to the apologetic I will ever make for my writing style.)

Questions are welcome.  If at any time you are curious about how something works or why something was done in a certain way, ask.  I will do my best to answer.  If you want to chat, you can find me on the Mozilla IRC network (irc.mozilla.org) as xb95.

For introduction: my name is Mark Smith.  I am a computer handyman of sorts, with previous experience doing things such as Perl programming for Danga Interactive and Six Apart, infrastructure design and construction from silicon up to app (again for Six Apart), .NET and Python programming for CCP Games, and plenty of my own projects on the side.  I generally consider myself more of an engineer than an admin, but my experience lends me many skills that find themselves well applied to Keeping Things Running and Making Things Run Better.

My interests are solidly on scale.  How do you take something and make it run for millions of users?  Working at Danga and later Six Apart was a really great environment for this, as I’d been with the LiveJournal project since 2001-ish (starting as a volunteer, moving to become an employee in 2004).  Working with the great people at these companies really helped me to understand many of the problems around Scale and some of the solutions to them.  The year I spent in Iceland working for CCP was good for showing me more solutions.  Their vastly different approach was fascinating, and I think the time I spent there was well spent.

I started here at the Mozilla Corporation (MoCo!) last week on Monday as a member of the IT team.  Responsibilities will include anything and everything, with a specific ear towards database (MySQL) related work, issues of scale, and wherever else I can be of use.  I’ve already done some work on one of the databases, and then ended up submitting a patch for Tinderbox.  This promises to be a very interesting job and a great learning experience.  Plus, being able to work on something that affects so many people in such a meaningful way … wow.

That’s about all of the introduction that is useful, I think.  Again, please find me on IRC should you need me (I think I’m on all of the major networks; Perl, Freenode, Mozilla, …) and always as xb95.

An actual informational post will be forthcoming sometime later today or tomorrow.  So I don’t flood you … yet.

No comments