Things I’ve learned about deploying a busy PHP application on Amazon EC2 WAMP/WIMP stacks
(the hard way)
In no particular order:
APC and __autoload
Rasmus said __autoload doesn’t play nice with APC.
This might have been true in 2006 when the blog post was written but it doesn’t seem so much following my tests (PHP 5.2.10 with APC 3.0.15). Our web application uses Ruby-esque MVC and URL dispatching with runtime context dependent class instantiation. There are around 28 controller and model classes. Including all of these every view incurred something like an additional 20-40% overhead over using __autoload. We’re using require_once for everything, too. APC saves you heaps of read operations - we went from something like 2k reads/second to a couple of hundred.
Memcache vs Memcached
Good luck using memcache’s binary protocol if you’re using Windows. Our (single so far) memcache box is instanced from Dustin’s memcached 1.4.0 on Ubuntu EC2 AMI. This runs like a dream. However, the current PHP memcache client is forked into memcache and memcached, and only memcache has a Windows build of the extension. Unfortunately only memcached lets you setOption OPT_BINARY_PROTOCOL. I gave myself a few hours on a Friday afternoon to try and build PHP and memcached in cygwin. Those few hours weren’t enough. It’s probably doable but it’s telling that nobody else seems to have done it yet.
Apache vs IIS
Under significant (but not really punishing) load, PHP on IIS gives “PHP has encountered an Access Violation at …” errors. Worker threads crash fairly frequently. We moved to Apache. Life is much simpler. Configuration of things like protected directories in the web root makes more sense - you move config from the server instance in IIS in that horrible pseudo-registry IIS uses to .htaccess files on your NFS web volume instead. CPU usage went down about 10-20% too…
EC2 Swap Confusion
EC2 can get horribly confused about its drives and where to put its swap file if you’re not careful. When making a new instance, follow these steps (if you’re using NFS EC2 volumes):
- Launch instance
- When instance is ready, reboot
- When instance has rebooted, attach volume
If you attach a volume too early, the instance might not find its own local d: drive and put its swap file on the tiny c: instead, resulting in 20 MB swap files. Not nice when you start getting low on physical RAM.
EC2 Monitoring (CloudWatch)
EC2 CloudWatch now has lovely Flash graphing and is very very nice. Use it.
XDebug, XProfile and CacheGrinding
XDebug with WinCacheGrind are very useful for profiling your PHP scripts and finding bottlenecks - however don’t use XDebug under heavy load or on a production box. Not because it kills your box (though it will slow it down some), but because it uses a ton of I/O writes to record its results which skews them towards implying everything is I/O bound. Try to look at percentages rather than absolute ms values to find bottlenecks / expensive loops etc.
Persistant MySQL connections
Seems to be a bit of FUD about these. They can help. We use them. As with anything like this be careful and read the documentation, and make sure you set sensible limits in php.ini and my.ini. Keep an eye on it hitting the ceiling too.
InnoDB vs MyISAM
InnoDB is now very mature and I don’t personally see any reason you’d use MyISAM anymore. Performance-wise you seeĀ very little difference for many-read performance (MyISAM used to be faster if I remember) but get some very nice gains with tables that are frequent read/write because of row-level locking. You also get transactions and foreign key constraints (oooh, referential integrity!) should you need them (doing anything involving money or mission critical systems?). We recently moved from part MyISAM / part InnoDB (where our heavy write tables were InnoDB) to all InnoDB, meaning we could throw the majority of our RAM at Inno. Our db box has never been happier.
PHP sessions and uploads on EC2
Don’t put your sessions and temporary upload files in c:\windows\temp - put them in directories on your web root volume instead. It’s much faster.
Use local (private) IP’s for MySQL and memcache
Oops, initially we were using the public IP addresses of the db and memcache boxes to connect from the web servers. Silly me. Connecting to the 10.x.x.x IP means a few less hops, which means faster connections and less latency. Didn’t make a huge amount of difference for us, but is the sensible option…
