No Gravatar

When we built our Premier Agent Websites, we decided at the beginning to base the product on the WordPress core. We knew we wanted to host it ourselves for ultimate control, we wanted to run bare minimal off-the-shelf plugins for security reasons, we needed it to scale it to tens and hundreds of thousands of sites, and we wanted it to be literally faster than any other provider out there. In order to meet these criteria and our high standards, we knew we had to make some smart infrastructure and architecture decisions.

We did things a bit different than what many of the current WordPress best practices recommend and many existing plugins offer, and we feel great about what we ended up with. With that in mind, we’d like to share some of our differentiating secret sauce with the WordPress community.

It All Starts With DNS

Before a user’s browser even hits a server for the initial load, an IP needs to be resolved via DNS. DNS performance is far more important (PDF) than many people realize, and this part of the infrastructure should never be glossed over without any thought.

In our case, we host nearly all of our clients’ DNS records on Amazon’s high-performance and globally-distributed Route 53 service. We allow each of our clients to host the free domain they get through us, or use one they already own, and manage the DNS for that domain on Route 53 via that API and a custom UI. We also host the DNS for our CDN domains (more on that later) on Route 53. The result is that the lookups on our clients’ domains and the CDN domains are incredibly fast!

Multiple Caching Layers

We cache at two different layers for two different reasons.

Our first caching layer is a group of auto-scaling servers dedicated to reverse-caching requests to our clients’ domains via Varnish. This not only helps us reduce overall load against our app servers, but it also allows us to deal with any traffic spikes that are thrown our way. We assign each site to one of the servers in the group and then flush the entire domain from the assigned Varnish server whenever anything on the site changes. On the Varnish end, we completely strip all incoming cookies from the request in order to maximize Varnish caching. The only functionality we lose by doing that is password-protected posts, which we re-implemented using query strings, and comment auto-fills, which we ignore for now but may re-implement later using JavaScript.

Our second caching layer is simply a Memcached cache facilitated by the excellent Memcached Object Cache plugin (one of the only two non-core plugins we use). For the requests that Varnish doesn’t handle, this helps keep things fast. In addition to having the WordPress core use this caching layer, we make sure to consider and use the wp_cache_* functions within all of our custom functionality.

Auto App Server Scaling

Our entire infrastructure is backed by Amazon Web Services (AWS), so we make extensive use of the functionality it offers that we couldn’t otherwise easily replicate. In the case of our application servers that actually handle PHP and process the WordPress requests, this means that we can use AWS auto-scaling to increase and decrease the number of servers we use depending on the load at any given time. The idea here is that we can scale in real time based on the demand against our servers instead of any other criteria that our clients frankly don’t care about.

Scaling Out Database Servers

The only other non-core plugin we use is HyperDb. In order to keep a good balance of clients between our database servers and allow us to add new servers into the mix with as little overhead as possible, we use the Flexihash consistent hashing library and the MD5 algorithm to determine the database server to use per client. The great thing about this setup is that we can add new servers into the mix as needed and only have to rebalance the data for a continually smaller portion of our clients for each of the new servers we add in.

Mod_pagespeed Rocks

Building mod_pagespeed, which just recently left beta status, into our infrastructure is easily one of the best performance choices we made with our platform.  We use mod_pagespeed to automatically minify our CSS and JavaScript, significantly reduce the number of requests to certain resources and perform a number of other very useful optimizations.

Most importantly for us, mod_pagespeed also rewrites the outgoing HTML so that the URLs of all images, JavaScript and CSS are pointed to Amazon’s CloudFront CDN and served with a 1-year cache expiration. Combined with bits of our own functionality within WordPress that removes the blog name from the URL of static resources, we found that even our low-traffic sites nearly always serve cache-primed static resources from locations geographically close to the visitors.

One additional benefit of mod_pagespeed and the CDN-enabling functionality it offers is that we can tune our Web application servers for serving almost exclusively PHP requests. Focusing on PHP only allows us to reduce overall complexity and make smarter decisions about server scaling based on load.

Serving User Media At Breakneck Speeds

With a liquid, EC2-based fleet of Web app servers that can scale up or down at any time, we knew we couldn’t store our users’ files on those servers. What we did instead was create functionality to store uploaded files directly in Amazon S3 – which is not a CDN by the way – and then rewrite the outgoing URLs to use multiple hostnames pointed to Amazon’s CloudFront CDN.

We additionally insert a unique hash of the actual file content into the file name and then set a 10-year cache header on the file during upload. Doing this has two benefits. First, the long cache duration allows the files to be almost indefinitely cached by CloudFront itself and by users’ browsers. Second, the content hash allows our clients to modify their files (through the image editor for example) and have those changes reflected instantly in any browser.

Conclusion

We designed our platform and infrastructure with performance as a high-priority goal, not as a nice-to-have side feature. We spent a lot of time investigating our options before settling on our choices, and we know we could still improve a few little things, but we think we made the right decisions overall.

Do you want to work on world-class platforms such as this? Do you think you could do better job? If so, we’re hiring!

Share →

One Response to On WordPress Scaling and Performance, The Zillow Way

  1. DaveBei says:

    Absolutely. You can reference our blog posts from your website. Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>