Drupal Varnish and what to put in .vcl file

Before I talked about some troubleshooting and had some general discussions about caching
http://drupaldump.com/varnish-simple-troubleshooting-defaultvcl
http://drupaldump.com/boost-and-varnish-and-general-tips-and-rants-abou…

Now I am going to talk about what to put in your default.vcl file and why.

backend default {
    .host = "127.0.0.1";
    .port = "8082";
    .connect_timeout = 600s;
    .first_byte_timeout = 600s;
    .between_bytes_timeout = 600s;
}

This one is obvious, it goes on top of everything and with it you set your IP and port and also timeouts between possible stale time.

What goes in sub vcl_recv

Compression handling
# Handle compression correctly. Different browsers send different
# "Accept-Encoding" headers, even though they mostly all support the same
# compression mechanisms. By consolidating these compression headers into
# a consistent format, we can reduce the size of the cache and get more hits.
# @see: http:// varnish.projects.linpro.no/wiki/FAQ/Compression
if (req.http.Accept-Encoding) {
if (req.http.Accept-Encoding ~ "gzip") {
# If the browser supports it, we'll use gzip.
set req.http.Accept-Encoding = "gzip";
}
else if (req.http.Accept-Encoding ~ "deflate") {
# Next, try deflate if it is supported.
set req.http.Accept-Encoding = "deflate";
}
else {
# Unknown algorithm. Remove it and send unencoded.
unset req.http.Accept-Encoding;
}
}

Pages to never cache

#Don't cache pages that should never be cached
 if (req.url ~ "^/status\.php$" ||
      req.url ~ "^/update\.php$" ||
      req.url ~ "^/admin$" ||
      req.url ~ "^/admin/.*$" ||
      req.url ~ "^/flag/.*$" ||
      req.url ~ "^.*/ajax/.*$" ||
      req.url ~ "^/user$" ||
      req.url ~ "^/users/.*$" ||
      req.url ~ "^.*/ahah/.*$") {
      return (pass);
  }

"pass" command ensures that server will always retrieve a fresh copy of the page.

Special case scenarios for pipes

#Pipe these paths directly to Apache for streaming.
if (req.url ~ "^/admin/content/backup_migrate/export") {
  return (pipe);
}

If you are using drupal backup and migrate module, you will need this. In this case varnish acts as a pipe between server and browser and information goes back and forth between those two. This could also be needed for some streaming modules and any files that are being generated on the server and downloaded in real-time and because of that are of unknown size, and Varnish can't provide the content-length information.

KEY CONCEPT, cookie cleaning or not

#check if there is session cookie from logged in user
if (!req.http.Cookie ~ "SESS|SSESS|NO_CACHE|VARNISH|DRUPAL_UID|LOGGED_IN") {
         unset req.http.Cookie;
    }

This is one of the key concepts which will make or break your varnish. On many places you will find cookie sanitizations which remove cookies unless it is logged in Session for drupal, so that you don't cache pages that are for logged in users. Well that is ok, but then I wondered, why not just check if there is a logged in session in cookie and if there is not, just remove all the cookies from request. And this works ofcourse. You could have a problem if you have some OTHER cookies that are not from logged in sessions. But then again you can add them the same way you could add it to sanitization. Above example covers both cookies for D6 and D7.

Cookie Sanitization

  if (req.http.Cookie) {
    set req.http.Cookie = ";" + req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(SESS[a-z0-9]+|NO_CACHE)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

    if (req.http.Cookie == "") {
      # If there are no remaining cookies, remove the cookie header. If there
      # aren't any cookie headers, Varnish's default behavior will be to cache
      # the page.
      unset req.http.Cookie;
    }
    else {
      # If there are any cookies left (a session or NO_CACHE cookie), do not
      # cache the page. Pass it on to Apache directly.
      return (pass);
    }
  }

So this code strips "all" the cookie but the one that are included (SESS one) which is done with "; \1=" ending part. Don't really see it why to use this approach. But you can, also bare in mind that I am not sure this even works properly and that it really sanitizes all the cookies you want, but this approach is all over the internet so here you go.

Catching files

# Cache things with these extensions whether we are logged in or not, we don't care
if (req.url ~ "\.(js|css|jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf)$") {
    return (lookup);
    }

We catch all this files for all users, as we don't care if you are logged in user or not, you always get the same front page .jpg, if you have some special cases for this, then you need to write exceptions for those.

Grace Period

# Allow the backend to serve up stale content if it is responding slowly.
  set req.grace = 6h;

If apache goes down, this can save you. Varnish will serve cached content 6h after the time of expire, so put this in.

No outside access

# Do not allow outside access to cron.php or install.php.
  if (req.url ~ "^/(cron|install)\.php$" && !client.ip ~ internal) {
    # Have Varnish throw the error directly.
    error 404 "Page not found.";
    # Use a custom error page that you've defined in Drupal at the path "404".
    # set req.url = "/404";
  }

Don't allow bots or others to access this paths, so you have less problems.

Excluding some site from caching

## This would make varnish skip caching for this particular site
if (req.http.host ~ "mexicodental.co$") {
return (pass);
}

This is really optional, but just that you have it here if needed.

sub vcl_fetch

 # Don't allow static files to set cookies.
  if (req.url ~ "(?i)\.(png|gif|jpeg|jpg|ico|swf|css|js|html|htm|gz|tgz|bz2|tbz|mp3|ogg|mp4|flv|f4v|pdf)(\?[a-z0-9]+)?$") {
    # beresp == Back-end response from the web server.
    unset beresp.http.set-cookie;
    return (deliver);
 }

Simple, don't allow this file extensions to set cookies, they will probably not do that but if they do, they will break caching, so put this also.

Grace in Fetch

# Allow items to be stale if needed.
  set beresp.grace = 6h;
}

Put here also grace period, this one has to be always the same or higher then first one. Take this as max time and the first as min time.

sub vcl_deliver

# Set a header to track a cache HIT/MISS.
  if (obj.hits > 0) {
    set resp.http.X-Varnish-Cache = "HIT";
  }
  else {
    set resp.http.X-Varnish-Cache = "MISS";
  }

This one is just so you have output of hit/miss situation in your browsers, you will find it under X-Varnish info.

sub vcl_error

# In the event of an error, show friendlier messages.
sub vcl_error {

  # Redirect to the homepage, which will likely be in the cache.
  set obj.http.Content-Type = "text/html; charset=utf-8";
  synthetic {"

If you get error, you will see just nasty message, so here you can make this page a bit nicer for users, here is the whole code of subroutine so you don't miss deliver part at the end.