Flickr sure has some passionate users!
February 2008 Archives
Once in a while I hear people talking about using thousands or tens of thousands of tables in one MySQL database and often how it “doesn’t work”. There are two things to say to them:
- Are you nuts?!
- Sure it works
I’ll elaborate a little on both …
Why’d you do that? Are you nuts?!
In most cases when extraordinarily many tables are brought up the “correct answer” is: Fix your schema to not duplicate the same table layout for each customer/user/site/…!
Once in a while though there are good reasons to have way too many lots of tables. Even “takes too long to fix the application now” can be a good enough answer sometimes.
My use case was to use the many tables as an extra “index” for a situation that a regular index couldn’t cover. Tens of thousands of tables, here we come.
Sure it works, just configure it right!
… which is easy!
In your startup script (/etc/init.d/mysql with the MySQL RPMs), add “ulimit -n 30000” somewhere near the top to allow mysqld to have 30000 open files (and sockets).
In your mysql configuration, add something like the following - adjust the number as appropriate.
set-variable = table_cache=12000
This will let MySQL keep up to 12000 tables open. The default limit is much too low and the system will spend all its time closing and opening the files. The other few thousand handles are free for database connections or whatever. You surely can tune the numbers more, but I haven’t needed to be more specific yet. MySQL uses two filehandles per open table (for MyISAM, it depends on the table type…)
Flushing the tables
One curious thing you’ll run into is that MySQL can take forever (read: hours!) flushing thousands of tables that have been changed if you do it with a simple “flush tables” or when you are shutting down mysql. That’s of course not “hmn, how curious” but rather insanely frustrating if you were going for a quick restart. This occasionally seems to happen with InnoDB tables too, but in particular with our large MyISAM system (we use fulltext indexes, hence MyISAM) this is a big issue.
With MyISAM tables, you also have a lot of cleaning up to do if the system crashes one way or another with thousands of tables open.
There’s an easy-ish solution to both these problems, though! Just flush the tables individually before the shutdown command and on a regular basis to mitigate the issue if it crashes. Remember the system or MySQL can crash for all sorts of reasons. Recently we had a motherboard with a BIOS or hardware bug that made it unstable after adding more than 16GB memory. On another box, one of the memory sticks went bad so suddenly it came up with 4GB memory less than previously. With MySQL carefully tuned to use all the memory (it was an InnoDB-only installation) it’d try using all 24GB memory and get killed by the kernel when it got above 20! Yeah, that one took some head scratching before I figured it out.
Below is a small program that’ll go through the database twice flushing all tables in all databases and then end with a regular “flush all”. We go through the database twice just in case the first flush took so long that a lot of tables got opened again. Two works for me, depending on your needs you might make it an option. Or a fancy version would check how many open tables are at the end of the run and go through it again if too many are open (and abort with an error if they open faster than they can get closed!). The final “flush tables” worked for me in getting everything closed just before the script exits (and the shutdown or whatever starts).
In exactly two months - April 14th - I'm doing a scalability tutorial at the MySQL conference.
It'll, of course, be based on the material I've been throwing around since 2001, but once again it's greatly expanded, extended and improved! It should be lots of fun.
I've yet to hear anyone tell me that they wasted their time with this particular session and I don't think people are just being polite because I'm often told that people find it tremendously useful. So go sign up already. And don't pay any attention to Brian Aker and dormando's shiny memcached tutorial going on at the same time. I promise I'll cover memcached, too. :-) Speaking of Alan Kasindorf (aka Dormando) he had a most excellent post about how to run your systems recently!
O'Reilly sent me a discount code to hand out to "friends and colleagues". I'm not sure what they mean by that. Can I post it here? If you read my drivel, surely you must be a friend. If you are interested in the session in the first place, surely you are some kind of colleague. To be safe however I won't post it here. If you want 20% off for the MySQL Conference send me an email.