skip page navigation Oregon State University

Greg Lund-Chaix

Syndicate content
Random thoughts of a cubicular denizen at the OSU Open Source Lab
Updated: 3 years 25 weeks ago

vcl_hash

10/14/2009

I’ve been trying to figure out the effect of this line in my Varnish config:

sub vcl_hash { if (req.http.Cookie) { set req.hash += req.http.Cookie; } }

It seemed to make sense, but I was having a hard time wrapping my head around its ramifications. I was looking at some of the docs on the Varnish site and at this great Varnish config walkthrough when the metaphorical lightbulb went on. By adding the cookie to the hash it’s effectively creating a per-session cache.

Hmm. An interesting tradeoff. On one hand it’s filling up my available cache with duplicate copies of the same content because the hash identifying the cached content is cookie-specific. On the other, it is delivering content from cache that wouldn’t normally be cached because of the cookie.

Share this: Digg del.icio.us Facebook Google Furl Print this article! Reddit Slashdot StumbleUpon Technorati TwitThis Fark LinkedIn Ma.gnolia NewsVine Ping.fm Pownce Tumblr

Categories: Planet OSL

High performance Varnish/Pressflow/Drupal community of practice

10/13/2009

At DrupalCamp PDX this weekend, I was fortunate enough to have some very interesting (if tantalizingly-brief) discussions with Josh Koenig (joshk), Sam Boyer (sdboyer) and Damien Tournoud around sharing configs and best practices for scaling Drupal sites, especially using Varnish and Pressflow. OK. We’ve talked about it. Now let’s do it!

I see three primary places we might start building on that seed:

Those of you out there running Drupal in large scale environments, let’s start sharing configs and techniques so we can all do better.

Share this: Digg del.icio.us Facebook Google Furl Print this article! Reddit Slashdot StumbleUpon Technorati TwitThis Fark LinkedIn Ma.gnolia NewsVine Ping.fm Pownce Tumblr

Categories: Planet OSL

Pressflow, Varnish and Caching … oh my!

10/12/2009

It all started with an itch. It was a really painful itch that involved a Drupal site that was essentially down due to load. I scratched it with the help of a few incredibly helpful blog posts I found, so now it’s my turn to add to them so someone else can benefit as well.

The Problem: HALP! The site, it is sinking!
HALP! The site, it is sinking!

A large school district wanted to replace their existing outdated static web site with a modern CMS. They chose Drupal as their platform. The new site was successful.

Too successful.

The average traffic of 5 hits/sec jumped to over 100 hits/sec and the server went into a swap death spiral.

Fear not! Help is on the way in to form of a couple of technological superheroes …

Pressflow and Varnish - Technology superheroes
Pressflow and Varnish to the rescue!

The mutually-complimentary combination of these two tools can vastly increase the number of users your site can serve. Here’s the what, why, and how:

Pressflow What:

Pressflow is “a derivative of Drupal core providing enhanced performance, scalability, and data integrity”. Basically, some really smart guys at Four Kitchens and elsewhere back-ported a bunch of Drupal 7 performance enhancements to the Drupal 6 (and even Drupal 5!) code base.

Why:

The most expensive thing a web site can do is have to fire up the entire Apache/PHP stack, pull something from the database, and render it. It takes a lot of time, processor cycles, and memory to do it. It’s slow. It ties up sessions waiting for the query to return and render the response. Whenever you can, push static content - images, CSS, JS, static files, etc. - into some sort of cache. Preferably in memory, and preferably as far out on the “edge” (as close to the requesting client browser) as possible. If we can avoid pulling something from disk (or the database), absolutely do it! If we can avoid even touching Apache/PHP (and its associated overhead), do it. The Pressflow changes help make the output more cache-friendly so that more and more of the site’s content can live in and be served from cache. With it we can free up those web server sessions and resources for serving content that does need to be dynamic.

How:

Pressflow adds the following features to Drupal.

  • Support for database replication
  • Support for Squid and Varnish reverse proxy caching
  • Optimization for MySQL
  • Optimization for PHP 5

All four are admirable additions that can help sites scale, but the second one is the primary reason I chose to bring the site up on Pressflow. It makes Drupal more cache-friendly, allowing us to store and serve more content from cache, speeding up the site and increasing the number of users who can be served.

My experience has been that Pressflow is also 100% compatible with Drupal core - I’ve switched sites back and forth between Pressflow and Drupal with no changes to the database or modules. Copy the /sites directory over and point the webroot at Pressflow, you’re done. Win!

So this is all well and good, but we need to actually have a cache in front of the server for this to do much good. Enter our second technology superhero:

Varnish What:

Varnish is an HTTP accelerator and caching reverse proxy. Varnish is all about speed. It stores as much content as it can in the fastest place possible - RAM in this case - and bypasses the expensive process of making a request to Apache.

Why:

Pressflow structures the Drupal content to be more cache-friendly, but we still need something to actually cache the content.

How:

Varnish sits in front of Apache, accepts incoming connections from browsers and, if possible, fulfills the requests from its cache. If it can’t, it passes the request on to the underlying Apache/PHP stack. It then takes the response from Apache and forwards it on to the requesting browser. If the response from Apache is cacheable, Varnish stores it in RAM for fulfilling future requests.

Setting up Pressflow

Installing Pressflow is just like installing Drupal. Grab the tarball from Four Kitchens, unzip it, do the usual Drupal setup. You’re probably going to want to make sure Cacherouter is installed and properly configured (including the config array added to settings.php). Point your Apache vhost at the Pressflow docroot. That’s it for Pressflow. It’s Drupal, really. Just tweaked.

Setting up Varnish
  1. Download and compile Varnish
  2. Configure your Apache vhosts to listen on an alternate port (8080, for example)
  3. Start varnishd
Configuring Varnish

Now the fun really begins. Varnish is amazingly-configurable. The VCL syntax is, even for a sysadmin-turned-developer like me, clear and relatively easy to understand. The default.vcl is well-commented. I learned a lot just reading through it. I’m going to post excerpts from my current Varnish config showing what I modified.

Please note - I am not a Drupal core hacker. I’m experimenting my way through this mostly through a “cut and try” methodology. This config info is based primarily on the work of others - Josh Koenig, Iskra/ekes and Narayan Newton. I don’t completely understand yet what a few of these configs do, exactly, other than they seem to make a positive difference in the hit rates on my systems and they didn’t seem to break anything. I’ll also include the configs I commented out because they did break something in the hopes that we can figure out why and what we should do to improve them.

By default, Varnish is set up to pass any requests with a cookie on to the backend (Apache) un-cached. We’re playing it safe and not risking sending an authenticated user outdated content by not sending cached content if we see a cookie. It’s a sane and conservative way of making sure that this is an anonymous user that can be given static content. Consequently, a lot of the customization is telling Varnish, “Even though there’s a cookie associated with this request you really can ignore the cookie and cache it” when the browser requests things like CSS files, JavaScript, theme images, or uploaded static files.

There are four basic places where I’ve added code: vcl_recv, vcl_hash, vcl_fetch and vcl_deliver. Below are the snipped (and very slightly-sanitized) sections from the config file I’m currently running in production (as of 10/10/2009).

vcl_recv

vcl_recv is where we configure what happens when Varnish receives a request from a browser client for some content.

sub vcl_recv {
… snip …
## Remove has_js and Google Analytics cookies.
set req.http.Cookie = regsuball(req.http.Cookie, “(^|;\s*)(__[a-z]+|has_js)=[^;]*”, “”);
## Remove a “;” prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, “^;\s*”, “”);
## Remove empty cookies.
if (req.http.Cookie ~ “^\s*$”) {
unset req.http.Cookie;
}

This first segment clears out some cookies that are unnecessary. With the cookies set, Varnish won’t cache the associated content. So we tell Varnish to unset the cookie before continuing.

## Catch Drupal theme files - THIS BREAKS UPDATE.PHP
#if (req.url ~ “^/sites/”) {
# unset req.http.Cookie;
#}
# Catch Drupal misc files (like drupal.js and jquery.js)
#if (req.url ~ “^/misc/”) {
# unset req.http.Cookie;
#}

When I first set up Pressflow and Varnish, I was looking at the cache hit/miss rate and noticed a lot of the CSS and JS files in /sites and /misc were not being cached. So I thought I’d be clever and tell Varnish that it really should cache these files by un-setting the cookies. For a while, it worked great. Hit rates were up, lots of stuff was now being cached. Then I need to roll out a security update and run update.php. With this config in place, update.php would either reject the attempt (because the admin user session cookie has been unset, reverting the user to an anonymous session) or, if $update_free_access is set to TRUE, causing an endless loop back to step one. I’m not quite sure what in /sites and /misc is the root of the problem (both seem to cause it), but I’ve disabled this until we can identify and work around it.

# Site still uses some static files out of /files, cache them
if (req.url ~ “^/files/site.*”) {
unset req.http.Cookie;
}
# enable caching of theme files (can’t enable globally due to update.php problem above)
if (req.url ~ “^/sites/www.site.*”) {
unset req.http.Cookie;
}

Because of the update.php problem above, and because update.php uses Garland instead of the site theme, we can tell Varnish to cache the theme, module, and uploaded files here.

# Drupal js/css doesn’t need cookies, cache them
if (req.url ~ “^/modules/.*\.(js|css)\?”) {
unset req.http.Cookie;
}

I noticed that we were also seeing a lot of misses on much of the core JS and CSS (like jquery.js), so we told Varnish to cache them.

## Moodle themes - disabled, seems to cause random problems
#if (req.url ~ “^/(theme|pix)/”) {
# unset req.http.Cookie;
#}

We also run Moodle vhosts on this server. This was my first attempt at convincing Moodle to cache its images and theme files. It failed miserably. Moodle currently requires the session cookies be set on the files in question. Un-setting any of them forces the user to re-authenticate on every pageload. Hence the commenting.

## Pass cron jobs and server-status
if (req.url ~ “cron.php”) {
return (pass);
}
if (req.url ~ “.*/server-status$”) {
return (pass);
}
… snip …
}

Lastly, we don’t want server-status or cron cached, so tell Varnish to ignore it and pass them straight to the backend without further processing.

vcl_hash

vcl_hash is where (I believe) Varnish looks at the hash for content it has cached to make sure it’s still good.

sub vcl_hash { if (req.http.Cookie) { set req.hash += req.http.Cookie; } }

This seems to help improve hit rates and non-intrusive, so I left it in.

vcl_fetch

vcl_fetch is where Varnish makes a request to the backend (Apache) for content it can’t for various reasons deliver from cache.

vcl_deliver

vcl_deliver is where Varnish delivers the requested content back to the browser client - either from cache or from a backend request.

Backend config

Lastly, we were seeing too many 503 errors under load, so we increased the timeouts to 600 seconds. So far we haven’t seen any ill effects from the log timeouts.

backend default {
.host = “127.0.0.1″;
.port = “8080″;
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
} /etc/conf.d/varnishd

We have made a couple of changes to the default varnishd startup options:

VARNISHD_OPTS=”-a *:80 \
-T 127.0.0.1:8181 \
-f /etc/varnish/default.vcl \
-p thread_pools=4 \
-p thread_pool_max=1500 \
-p listen_depth=2048 \
-p lru_interval=1800 \
-h classic,169313 \
-p obj_workspace=4096 \
-p connect_timeout=600 \
-p max_restarts=6 \
-s malloc,2G”

Couple of notes here on this:

  1. connect_timeout=600 - we were seeing random 503 errors when the system was under load, even though there were Apache workers available. We extended the timeouts to 600 seconds, figuring it was better to have an individual user occasionally get an element that loads slowly or times out than many users across the site seeing an uninformative “503 guru meditation”. So far we’ve not seen any poor side effects.
  2. malloc,2G - we’re running the site with malloc storage (instead of the default file-based) at 2 gigabytes. The sites being served from behind this Varnish instance are quite a bit bigger than that, but because Varnish is running on the same box as Apache, we decided to throttle Varnish to leave resources for Apache/PHP. It’s likely this is suboptimal, but it works.
Tweaking and tuning Varnish, or “Is this thing on?”

So I’ve installed Varnish and Pressflow. The site’s up and running. How do I tell if it’s doing any good?

With Pressflow and Varnish installed with the default configurations we saw an immediate drop in load and better performance, but I wanted to optimize it to cache as much as possible. Fortunately, Varnish comes with an excellent set of tools to see what it’s doing:

varnishtop

This command shows the most often-made requests to the backend:

varnishtop -b -i TxURL

It’s excellent for spotting often-requested items that are currently not being cached. The “-b” flag filters for requests made to the backend. “-i TxURL” filters for the request URL that triggered the request to the backend. Its output looks something like this:
varnishtop -b -n TxURL
Top of the list, most often-requested URL from the backend. A prime candidate for caching.

varnishhist

This command hows a histogram for the past 1000 requests, whether they were cache hits (denoted by a ‘|’) or misses (denoted by a ‘#’), and how long the requests took to process (further to the right, longer time). It’s good for a high-level view of how the server is doing under load.
varnishhist

varnishlog varnishlog -c -o ReqStart

This command displays all varnish traffic for a specific client. It’s helpful for seeing exactly what a particular page or request is doing. Set it to your workstation IP, load the page, see everything Varnish does with your connection including hit/miss/pass status. Varnishlog is really useful, but it puts out an overwhelmingly-large amount of data that isn’t easily filtered. The “-o” option groups all of the entries for a specific request together (without it all entries from all requests are displayed fifo) and it accepts a tag (”ReqStart” in this example) and regex (the IP address in this case) to filter for only requests associated with that tag & regex. It’s the only way I’ve found to filter down the firehose of log entries into something useful.
varnishlog -c -o ReqStart

varnishstat

This command provides an overview of the stats for the current Varnish instance. It shows hit/miss/pass rates and ratios, lots of other gory internal details.
varnishstat

Watch that RAM, or “vmstat, oh how I love thee!”

Varnish can eat RAM like there’s no tomorrow. Be careful and be sure to configure its max memory to be something less than your available RAM. I forgot when I first set things up. The system worked great for a while, and then took a nosedive as the Varnish cache ate up all the available RAM and pushed the system into a swap death spiral.

It’s OK to not cache everything

This is a concept I struggled with at first - “oh no! It’s not caching xyz! I must fix that!” Remember that even if you can’t cache all the static content on your site, you’re still doing a lot of good offloading the most commonly-accessed content onto Varnish. Every connection you can serve from Varnish frees up an Apache thread to do something else.

The Resolution

I am delighted to report that the site is currently serving more than 30,000 hits per day without any trouble. We’ve seen traffic exceed 150 simultaneous clients without pushing the system into swap, nor is it seeing significant iowait. As best we can tell, at peak traffic times it’s entirely processor-bound with all four cores running at 95% or higher servicing apache threads.

OK, now what?

Moving forward, how do we do better? There were some great discussions at DrupalCamp PDX this weekend about how we can do more with Varnish and Drupal/Pressflow. Some interesting ideas that came up:

  • Edge Side Includes (ESI), and making Drupal aware of them. Specifically interesting in relation to pushing the internal Drupal panels and block caches into an ESI store.
  • A Varnish module within Drupal that hooks into the internal Drupal cache functions and can talk to the Varnish server on its management port to tell it to purge or extend the lifetime of content.
  • Preconfigured Amazon EC2 images to quickly scale sites … oh wait, there’s Project Mercury!
  • Intelligent ways to front multiple Drupal webnodes with one (or a clustered pair) of Varnish nodes.

I’m up for taking a whack at it. Any Drupal core wizards out there interested?

Photo credits:
[1] [2] [3]

Share this: Digg del.icio.us Facebook Google Furl Print this article! Reddit Slashdot StumbleUpon Technorati TwitThis Fark LinkedIn Ma.gnolia NewsVine Ping.fm Pownce Tumblr

Categories: Planet OSL