Skip to content. | Skip to navigation

Personal tools
Log in
Sections
You are here: Home Posts

Site News

High Availability Varnish Configuration for Plone

by Nathan Van Gheem last modified Sep 20, 2011 04:17 AM
How to get varnish to continue serving out stale, content when your backend may be down.

Why

There are many reasons why a backend server could go down or be unresponsivw and there is no reason that your caching proxy can't serve out stale content while it is down or slow to respond.

How

There are a few tricks that will help you get better performance out of varnish and that will trick varnish into serving stale content instead of an error.

Serving Stale Content

Restart the request and have varnish use an always down server on error so that it'll serve stale content right away

  1. Setup the fake backend
    ...
    backend failapp { 
      .host = "127.0.0.1"; 
      .port = "9999"; 
      .probe = { 
        .url = "/hello/"; 
        .interval = 12h; 
        .timeout = 1s; 
        .window = 1; 
        .threshold = 1; 
      } 
    }
    ...
  2. Set the grace period on the request in vcl_recv
    ...
      if (!req.backend.healthy) {
        set req.grace = 1d;
      } else {
         set req.grace = 15m;
      }
    ...
  3. Set grace period for response in vcl_fetch
    ...
    set beresp.grace = 10d;
    ...
  4. Set a marker error header in the vcl_error section and restart the request
    ...
    sub vcl_error {
      /* set a marker on so we know there is an error with the backends
         and that we should serve out stale content */
      if ( req.http.X-Varnish-Error != "1" && req.request != "PURGE" && req.restarts == 0) {
        set req.http.X-Varnish-Error = "1";
        return (restart);
      }
    }
    ...
  5. Check for the marker error header in the vcl_recv and set to already down backend
    ...
      if (req.http.X-Varnish-Error == "1") {
        set req.backend = failapp;
        unset req.http.X-Varnish-Error;
      } else {
        set req.backend = plone;
      }
    ...
    

Cleaning Up The URL

There is no need to cache the different hash urls(#) or different query parameters for google analytics

...
  if (req.url ~ "\#") {
    set req.url=regsub(req.url,"\#.*$","");
  }
  # Strip out Google related parameters
  if(req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|gclid|cx|ie|cof|siteurl)=") {
    set req.url=regsuball(req.url,"&(utm_source|utm_medium|utm_campaign|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)","");
    set req.url=regsuball(req.url,"\?(utm_source|utm_medium|utm_campaign|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)","?");
    set req.url=regsub(req.url,"\?&","?");
    set req.url=regsub(req.url,"\?$","");
  }
...

Full Example Configuration

In this configuration, keep some things in mind:

  • The configuration is manually setting the cache age on these objects and relying more on purges to handle cache refreshes
  • The configuration assumes the public site is not for logging in, so no cookie handling is happening
  • The configuration sets additional response headers so you can see information on how varnish handled the response(ttl, grace, status, hit)
  • This exact configuration is cleaned up from what I actually use in production and you'll need to clean it up and implement your own parts of it to an extent. Please don't assume that this is just a drop in replacement.
acl purge {
  "localhost";
  "127.0.0.1"; /* and everyone on the local network */
  "10.10.10.10";
}

/* failapp is used to help trick varnish into using stale content */
backend failapp { 
  .host = "127.0.0.1"; 
  .port = "9999"; 
  .probe = { 
    .url = "/hello/"; 
    .interval = 12h; 
    .timeout = 1s; 
    .window = 1; 
    .threshold = 1; 
  } 
}

backend cms1 { 
  .host = "10.10.10.1"; 
  .port = "8080"; 
  .connect_timeout = 10s; 
  .max_connections = 30; 
  .first_byte_timeout = 300s; 
  .probe = { 
    .url = "/"; 
    .interval = 3s; 
    .timeout = 3s; 
    .window = 5; 
    .threshold = 2; 
    .initial = 1;
  } 
}
backend cms2 { 
  .host = "10.10.10.1"; 
  .port = "8081"; 
  .connect_timeout = 10s; 
  .max_connections = 30; 
  .first_byte_timeout = 300s; 
  .probe = { 
    .url = "/"; 
    .interval = 3s; 
    .timeout = 3s; 
    .window = 5; 
    .threshold = 2; 
    .initial = 1;
  } 
}
backend cms3 { 
  .host = "10.10.10.1"; 
  .port = "8082"; 
  .connect_timeout = 10s; 
  .max_connections = 30; 
  .first_byte_timeout = 300s; 
  .probe = { 
    .url = "/"; 
    .interval = 3s; 
    .timeout = 3s; 
    .window = 5; 
    .threshold = 2; 
    .initial = 1;
  } 
}
backend cms4 { 
  .host = "10.10.10.1"; 
  .port = "8083"; 
  .connect_timeout = 10s; 
  .max_connections = 30; 
  .first_byte_timeout = 300s; 
  .probe = { 
    .url = "/"; 
    .interval = 3s; 
    .timeout = 3s; 
    .window = 5; 
    .threshold = 2; 
    .initial = 1;
  } 
}

director plone round-robin {
  { .backend = cms1; }
  { .backend = cms2; } 
  { .backend = cms3; } 
  { .backend = cms4; } 
}

sub vcl_recv {
  if (req.http.X-Varnish-Error == "1") {
    set req.backend = failapp;
    unset req.http.X-Varnish-Error;
  } else {
    set req.backend = plone;
  }
  if (req.request != "GET" &&
      req.request != "HEAD" &&
      req.request != "PUT" &&
      req.request != "POST" &&
      req.request != "TRACE" &&
      req.request != "OPTIONS" &&
      req.request != "DELETE" &&
      req.request != "PURGE") {
    /* Non-RFC2616 or CONNECT which is weird. */
    return (pipe);
   }

  if (req.request != "GET" && req.request != "HEAD" && req.request != "PURGE") {
    /* We only deal with GET and HEAD by default */
    return (pass);
  }

/* Time to mess with the request */
  unset req.http.Cookie;
  unset req.http.User-Agent;
  unset req.http.Accept-Charset;

  if (req.http.Accept-Encoding) {
    if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|pdf|headerImage)$") {
      # No point in compressing these
      remove req.http.Accept-Encoding;
    } elsif (req.http.Accept-Encoding ~ "gzip") {
      set req.http.Accept-Encoding = "gzip";
    } elsif (req.http.Accept-Encoding ~ "deflate") {
      set req.http.Accept-Encoding = "deflate";
    } else {
      # unkown algorithm
      remove req.http.Accept-Encoding;
    }
  }

  # Strip hash, server doesn't need it.
  if (req.url ~ "\#") {
    set req.url=regsub(req.url,"\#.*$","");
  }
  # Strip out Google related parameters
  if(req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|gclid|cx|ie|cof|siteurl)=") {
    set req.url=regsuball(req.url,"&(utm_source|utm_medium|utm_campaign|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)","");
    set req.url=regsuball(req.url,"\?(utm_source|utm_medium|utm_campaign|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)","?");
    set req.url=regsub(req.url,"\?&","?");
    set req.url=regsub(req.url,"\?$","");
  }
/* End modifying the request */

  if (req.request == "PURGE") {
    if (!client.ip ~ purge) {
       error 405 "Not allowed.";
    }
    return(lookup);
  }

/* grace and saint related settings.
   To ensure to always serve static content. */
  if (!req.backend.healthy) {
    set req.grace = 1d;
  } else {
     set req.grace = 15m;
  }
/* end saint/grace mode stuff */

  return (lookup);
}

sub vcl_error {
  /* set a marker on so we know there is an error with the backends
     and that we should serve out stale content */
  if ( req.http.X-Varnish-Error != "1" && req.request != "PURGE" && req.restarts == 0) {
    set req.http.X-Varnish-Error = "1";
    return (restart);
  }
}

sub vcl_hash {
   set req.hash += req.url;
   if (req.http.Accept-Encoding) { set req.hash += req.http.Accept-Encoding; }
   return (hash);
}

sub vcl_fetch {
  unset beresp.http.set-cookie;
  if (beresp.status == 500) {
    set beresp.saintmode = 5s;
    set req.http.X-Varnish-Error = "1";
    return (restart);
  }

  /* override ttls */
  if(beresp.status == 301 || beresp.status == 302){
    /* all redirects can be cached for a long time. Granted we always have invalidation. */
    set beresp.ttl = 5h;
  } else if(req.url ~ ".*portal_css.+cachekey.*\.(css|js)$") {
    /* generated css/js files should be cached for a LONG time. All unique urls. */
    set beresp.ttl = 10d;
  } else if (req.url ~ "(\.jpg|\.png|\.gif|\.gz|\.tgz|\.bz2|\.tbz|\.mp3|\.ogg|\.pdf|\.css|\.js|/image_(large|preview|mini|thumb|tile))$") {
    /* all file type resources can be cached for an hour */
    set beresp.ttl = 1h;
  }else{
    /* everything else */
    set beresp.ttl = 30m; /* how long should varnish cache it? */
  }
  set beresp.grace = 10d; /* The max amount of time to keep object in cache */
  set beresp.http.X-Varnish-beresp-ttl = beresp.ttl;
  set beresp.http.X-Varnish-beresp-grace = beresp.grace;
  set beresp.http.X-Varnish-beresp-status = beresp.status;
}

sub vcl_hit {
   if (req.request == "PURGE") {
     set obj.ttl = 0s;
     error 200 "Purged.";
    }
}

sub vcl_miss {
  if (req.request == "PURGE") {
    error 404 "Not in cache.";
  }
}


sub vcl_deliver {
  if (obj.hits > 0) {
    set resp.http.X-Cache = "HIT";
  } else {
    set resp.http.X-Cache = "MISS";
  }
}

Additional Tips

  • Varnish doesn't have nice error messages, so use nginx to override 500 errors to your liking if, for some reason, there is an error on a resource that was not in the stale cache.
  • Varnish's cache is NOT persistent(although, varnish 3.0 is supposed to be) so if you restart your varnish process, you'll lose your long term cache.
  • Also, you're limited by the size of your size. If you have a large site, make sure that you set the varnish file cache size to something very large so that you're able to utilize the use of stale content.

Plone 3.3.5 on Mac OS X Lion

by Nathan Van Gheem last modified Aug 27, 2011 05:25 AM
Some tips for Plone 3 on Lion.

Getting Python 2.4

First off, make sure you have a version of python 2.4 installed on the system. If you use the one located in the svn collective, it has a few patches that make it work correctly with Lion.

svn co http://svn.plone.org/svn/collective/buildout/python/
cd python
python bootstrap.py
./bin/buildout 

Then use that python with your buildout.

Beware of collective.xdv

I didn't have enough time to figure out why, but xdv was making my instance crash on startup with no explanation. I did get plone to startup by upgrading to the latest version of xdv and collective.xdv but it would still crash when rendering a page for me. For now, I've just disable xdv on Lion and at least for a working plone 3 dev machine.

Small post but I just wanted to put it up in case someone else was experiencing the same problems.

Fixing Broken ZODB Object references

by Nathan Van Gheem last modified May 25, 2011 06:05 AM
I'm not an expert on this by any means, but here are some notes on my latest episode.

Introduction

If you start seeing POSKeyErrors on certain object, it most likely means your database is in some form of inconsistency. The problem is very well described by Elizabeth Leddy on her blog here. Her blog didn't quite handle the case that I encountered, missing objects--no oid in ZODB.

Getting Started

Run fsrefs.py to test your database and have it tell you which objects are bad.

python /path/to/eggs/ZODB/scripts/fsrefs.py /path/to/zodb/Data.fs

Will yield results like:

 oid 0x959755L BTrees.OOBTree.OOBucket
last updated: 2011-04-15 13:31:28.380634, tid=0x38DA88B79173877L
refers to invalid object:
	oid 0x0135ca66 missing: ''

oid 0x135CA59L Products.ATContentTypes.content.document.ATDocument
last updated: 2011-04-11 22:21:16.544874, tid=0x38D941D46976A11L
refers to invalid objects:
	oid 0x0135ca65 missing: ''
	oid 0x0135ca5c missing: ''

oid 0x135CA6AL BTrees.OOBTree.OOBTree
last updated: 2011-04-11 22:16:14.294142, tid=0x38D94183CFD03CCL
refers to invalid object:
	oid 0x0135ca6b missing: ''

Testing Out The Bad Object

from ZODB.utils import p64
from persistent import Persistent
obj = app._p_jar[p64(0x959755L)] obj

Should give the error:

2011-05-24 09:23:31 ERROR ZODB.Connection Couldn't load state for 0x0135ca59
Traceback (most recent call last):
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZODB/Connection.py", line 811, in setstate
    self._setstate(obj)
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZODB/Connection.py", line 870, in _setstate
    self._reader.setGhostState(obj, p)
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZODB/serialize.py", line 604, in setGhostState
    state = self.getState(pickle)
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZODB/serialize.py", line 597, in getState
    return unpickler.load()
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZODB/serialize.py", line 471, in _persistent_load
    return self.load_oid(reference)
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZODB/serialize.py", line 537, in load_oid
    return self._conn.get(oid)
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZODB/Connection.py", line 244, in get
    p, serial = self._storage.load(oid, self._version)
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZEO/ClientStorage.py", line 712, in load
    return self.loadEx(oid, version)[:2]
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZEO/ClientStorage.py", line 735, in loadEx
    data, tid, ver = self._server.loadEx(oid, version)
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZEO/ServerStub.py", line 196, in loadEx
    return self.rpc.call("loadEx", oid, version)
  File "/opt/Zope/buildout-cache/eggs/ZODB3-3.8.4wc1-py2.4-linux-x86_64.egg/ZEO/zrpc/connection.py", line 699, in call
    raise inst # error raised by server
POSKeyError: 0x0135ca65

Fixing It

Basically, you're going to need to replace the offending object with a placeholder object and then deleted that object after there is no longer a missing object.

from ZODB.utils import p64
from persistent import Persistent
replace_obj = Persistent() replace_obj._p_oid = p64(0x0135ca66) replace_obj._p_jar = app._p_jar app._p_jar._register(replace_obj) app._p_jar._added[p64(0x0135ca66)] = replace_obj import transaction transaction.commit()

obj = app._p_jar[p64(0x959755L)]
obj 

Now you'll be able to delete the object. The object could be in a list(item), dict(item) or a content type which will all require different methods to delete it.

Notes on a More Secure Plone Deployment

by Nathan Van Gheem last modified Apr 15, 2011 05:00 AM
Some things to think about if you're planning on providing a more secure Plone site. While Plone is a very secure CMS with an incredible track record, there are still plenty of things you can do to protect sites that might be larger targets.

Read-only Public Site

Making your public site read-only will prevent even a compromised site from taking any damage--even if a malicious user does somehow gain access, they can't save any different data to the database.

There are a few ways to do this:

  • Zope Replication Services(ZRS) allow you replicate a read-write backend private server to a read-only public facing site
  • You can also use RelStorage for you zeoserver. Then use the replication facilities provided by some RDMSs to replicate to a read-only zeoserver on the public site.
  • It is also possible to have read-only zeo clients connected to a read-write zeo server.
  • zeoraid might even be an option(never tried it)

One thing to note is that there are some cases where Plone will try to write on read unfortunately. To get around this, I create a before commit event handler in a policy product to abort every transaction when the server is read-only. It's kind of hackish but a necessary evil to prevent a user from getting a nasty ReadOnly database error thrown at them. It would look something like:

from zope.component import adapter
from ZPublisher.interfaces import IPubBeforeCommit
import App.config
import transaction
configuration = App.config.getConfiguration()
readonly = configuration.read_only_database
@adapter(IPubBeforeCommit)
def abortTransactionOnReadOnly(event):
if readonly:
transaction.abort()

Rewrite Login URLs

You can also rewrite login urls on the public site to restrict anyone from seeing a login form. Just do normal rewrites at your proxy server.

Urls you'll want to rewrite are:

  • /manage
  • /login
  • /logged_out
  • /require_login
  • /acl_users

This will prevent anyone from seeing a login form and an unauthorized page.

You also might want to disable basic auth on the proxy server.

Keep it Secret, Keep it Safe

It's best if no one knows where your backend, read-write server is located except your content curators. What's more important is that even if someone knows where your site is located, they can't access it without some form of authentication first(in addition to normal plone authentication). There are a few ways to accomplish this:

  • Factored Authentication: Require something like SecureID to protect access to the read-write server
  • Basic Auth: If you're cheap and not hyper sensitive about the security, you could just provide an additional basic auth layer of authentication to prevent any access to the read-write server--just give all content curators the same username:password and then they login again to the Plone site.
  • Make site only accessible via VPN
  • Only provide access to the site on a local network at the workplace

Anonymity

If your read-write server is accessible in anyway(even if behind a factored authentication) you should still try and protect the knowledge of its existence.

  1. Provide an overriding robots.txt to deny all search engines from indexing your read-write site url. This can be done with simple nginx and apache rules.
  2. Make sure your content editors do NOT link to your read-write server. As silly as this sounds, it WILL happen if you don't do anything to prevent it.  You can customize Tiny MCE to filter urls. Basically, customize tiny_mce_init.js in portal_skins/custom adding somethings like:
    var bad_urls = [
        'https://www.readwrite.com', 
        'http://www.readwrite.com',
    ];
    var replace_link = 'http://www.readonly.com';
    
    function filter_links(url, node, on_save){
        for(var i = 0; i < bad_urls.length; i++){
            var bad = bad_urls[i];
            url = url.replace(bad, replace_link);
        }
        return url;
    }
    
    ...
    
            window.tinyMCE.init({
    
    ...
    
                urlconverter_callback : filter_links
    

    This will prevent any from being able to link your read-write server.

  3. I also have run scripts periodically that go through all the content of the site looking for bad links. This then checks on people potentially using kupu or putting links into fields that do not have WYSIWYG editors.

Monitor

This might be obvious, but you need to make you have some sort of monitoring in place to track rejected logins to your backend. This will depend on what you've used to secure your backend.

Using Plone as a Document Repository

by Nathan Van Gheem last modified Apr 14, 2011 02:39 AM
Sharing my experience in using plone to OCR PDF documents and displaying the documents in the browser with Flex Paper.
Using Plone as a Document Repository

flex paper viewer

We just released a new site that houses thousands of scanned PDF documents that are now viewable in the browser via Flex Paper. We started with PDFs that were just scanned images. Plone, with the help of a few packages, then OCR'd and replaced the PDF with a searchable PDF counterpart.

Features

  • Convert Image PDFs to searchable versions
  • Split large PDFs into multiple documents
  • Overwrite metadata of PDF
  • OCR text is then searchable via Plone search
  • Online viewable version
  • All document processing is done via asynchronous processes so adding documents is not slow
  • Can monitor conversion asynchronous processes

Requirements

  • wc.pageturner : For online viewable PDFs
  • wildcard.pdfpal : heavy lifting in PDF processing
  • plone.app.async : asynchronously process PDF documents
  • Tesseract > 3.0.1 system package
  • swftools system package
  • ghostscript system package
  • hocr2pdf system package
  • pdftk system package
  • tiff2pdf system package

Caveats

  • Probably only works in Linux
  • wildcard.pdfpal is pretty specific and isn't smart at if it should process the PDF. For instance, if the PDF is already searchable, it'll still try to convert it regardless.
  • We're not really interested in wildly supporting pdfpal beyond our use case(that's why it's not listed on plone.org, but in the collective and on pypi). So if you're interested in implementing this, you might end up contributing to the project and cleaning up some of the cruft in the package.

gunicorn startup script for Django

by Nathan Van Gheem last modified Mar 23, 2011 11:52 AM
with multiple servers

I've been using Django a lot recently and had this startup script that I use and thought others might find useful :)

 

#!/bin/sh

ADDRESS='127.0.0.1'
PYTHON="/opt/django/bin/python"
GUNICORN="/opt/django/bin/gunicorn_django"
PROJECTLOC="/opt/django/project"
MANAGELOC="$PROJECTLOC/manage.py"
DEFAULT_ARGS="--workers=3 --daemon --bind=$ADDRESS:"
BASE_CMD="$GUNICORN $DEFAULT_ARGS"

SERVER1_PORT='8200'
SERVER1_PID="$PROJECTLOC/$SERVER1_PORT.pid"
SERVER2_PORT='8201'
SERVER2_PID="$PROJECTLOC/$SERVER2_PORT.pid"

start_server () {
  if [ -f $1 ]; then
    #pid exists, check if running
    if [ "$(ps -p `cat $1` | wc -l)" -gt 1 ]; then
       echo "Server already running on ${ADDRESS}:${2}"
       return
    fi
  fi
  cd $PROJECTLOC
  echo "starting ${ADDRESS}:${2}"
  $BASE_CMD$2 --pid=$1
}

stop_server (){
  if [ -f $1 ] && [ "$(ps -p `cat $1` | wc -l)" -gt 1 ]; then
    echo "stopping server ${ADDRESS}:${2}"
    kill -9 `cat $1`
    rm $1
  else 
    if [ -f $1 ]; then
      echo "server ${ADDRESS}:${2} not running"
    else
      echo "No pid file found for server ${ADDRESS}:${2}"
    fi
  fi
}

case "$1" in
'start')
  start_server $SERVER1_PID $SERVER1_PORT 
  start_server $SERVER2_PID $SERVER2_PORT
  ;;
'stop')
  stop_server $SERVER1_PID $SERVER1_PORT
  stop_server $SERVER2_PID $SERVER2_PORT
  ;;
'restart')
  stop_server $SERVER1_PID $SERVER1_PORT
  sleep 2
  start_server $SERVER1_PID $SERVER1_PORT
  sleep 2
  stop_server $SERVER2_PID $SERVER2_PORT
  sleep 2
  start_server $SERVER2_PID $SERVER2_PORT
  ;;
*)
  echo "Usage: $0 { start | stop | restart }"
  ;;
esac

exit 0

Just make sure to fill in all the variables to your liking.

Adding the script

This works for ubuntu at least...

  1. Place the script in the file /etc/init.d/gunicorn or whatever you'd like to call it
  2. make it executable
    chmod +x /etc/init.d/gunicorn
  3. And finally, wire it up
     update-rc.d gunicorn defaults

nginx with built in load balancing and caching

by Nathan Van Gheem last modified Sep 19, 2010 12:00 AM
nginx can do it all. Short example to get nginx going with buildout to provide load balancing and caching.

Another Update

I've received an angry email telling me that I should feel shame and I am misleading people because the specific example I give below does not work for them. First off, I never meant this to be a comprehensive example--I don't have time for that. Second of all, I thought it was obvious that the implementor would need to fill in some of the details. This is just an example of how it can be done--NOT a drop in replacement.

So, you may need to create cache directories, read some docs and customize some settings to get this going. Some effort is required on the implementor's end.

Update

Some have commented that in using nginx to do your load balancing you lose session affinity since nginx won't send it's users to the same backend. This can hurt performance since each zeo client would potentially have to cache a single user's specific objects.

If you find this to be a problem, there is a sticky nginx module that will handle this for load balancing. With this, each browser will be sent to the same backend.

Introduction

Why muck around with HAProxy and Varnish when you can have nginx do it all for you. The setup is easy and it's a lot easier to maintain.

Installing nginx with buildout

You can setup nginx fairly easily with buildout. The only fancy part of our setup is that we're going to include the nginx cache purge module.

  1. Add the cache purge part to your buildout
    [ngx_cache_purge]
    recipe = hexagonit.recipe.download
    url = http://labs.frickle.com/files/ngx_cache_purge-1.1.tar.gz
    strip-top-level-dir = true
  2. I needed the pcre source to compile also
    [pcre-source]
    recipe = hexagonit.recipe.download
    url = ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.00.tar.gz
    strip-top-level-dir = true
    
  3. and the nginx part
    [nginx-build]
    recipe = hexagonit.recipe.cmmi
    url = http://nginx.org/download/nginx-0.8.45.tar.gz
    configure-options =
        --with-http_stub_status_module
        --conf-path=${buildout:directory}/settings/nginx.conf
        --error-log-path=${buildout:directory}/var/log/nginx-error.log
        --pid-path=${buildout:directory}/var/nginx.pid
        --lock-path=${buildout:directory}/var/nginx.lock
        --with-pcre=${pcre-source:location}
        --with-http_ssl_module
        --add-module=${ngx_cache_purge:location}
    
  4. and finally, add it all to the parts directive
    parts =
        ...
        pcre-source
        ngx_cache_purge
        nginx-build
        ...
    
    
  5. After you re-run buildout, you'll be able to run nginx by issuing a command like this:
    ./parts/nginx/sbin/nginx -c /path/to/configuration/nginx.conf

nginx configuration

Here is a simple sample configuration for nginx configured with load balancing and caching. It can obvious get as complicated as you want it, but I definitely think this is easier than managing haproxy, varnish and nginx.

pid /path/to/buildout/var/nginx.pid;
lock_file /path/to/buildout/var/nginx.lock;

worker_processes 2;
daemon off;

events {
    worker_connections 1024;
}

error_log /path/to/buildout/var/log/nginx-error.log warn;

# HTTP server

http {
    server_names_hash_bucket_size 64;

    # this is how you do simple round robin load balancing with nginx.
	# you can define as many backup servers as you'd like here.
    upstream plone {
        server 127.0.0.1:8080;
        server 127.0.0.1:8081;
    }
    
    access_log /path/to/buildout/var/log/main-access.log;


 	# can specify multiple cache paths for different resources/paths/proxies
    # if needed..
    # the levels=1:2 just means it'll store the cache'd files 2 levels down in
    # the folder structure
    proxy_cache_path  /var/www/cache  levels=1:2 keys_zone=thecache:100m max_size=1000m inactive=600m;
    proxy_temp_path /var/www/cache/tmp;

    # Here is the caching purge handling. Purge request come in here
    server {
        listen 8089;
        server_name www.example.com;

        access_log /path/to/buildout/var/log/purge.log;

        location / {
      	  allow			127.0.0.1;
      	  deny			all;
      	  proxy_cache_purge thecache $scheme$proxy_host$request_uri;
        }
    }
    
    server {
        listen 80;
        server_name www.example.com;


		# log for cache hits.
		log_format cache '***$time_local '
		                             '$upstream_cache_status '
		                             'Cache-Control: $upstream_http_cache_control '
		                             'Expires: $upstream_http_expires '
		                             '"$request" ($status) '
		                             '"$http_user_agent" ';

		access_log /path/to/buildout/var/log/cache.log cache;

        # Enable gzip compression
        gzip             on;
        gzip_min_length  1000;
        gzip_proxied     any;
        gzip_types       text/xml text/plain application/xml;

        # Show status information on /_main-status
        location = /_main_status_ {
            stub_status on;
            allow 127.0.0.1;
            deny all;
        }

		# do not cache when users are logged in..
        proxy_cache_bypass $cookie___ac;

	    location / {
	        proxy_redirect                  off;
			proxy_set_header                Host $host;
			proxy_set_header                X-Real-IP $remote_addr;
			proxy_set_header                X-Forwarded-For $proxy_add_x_forwarded_for;
			client_max_body_size            0;
			client_body_buffer_size         128k;
			proxy_send_timeout              120;
			proxy_buffer_size               4k;
			proxy_buffers                   4 32k;
			proxy_busy_buffers_size         64k;
			proxy_temp_file_write_size      64k;
			proxy_connect_timeout           75;
			proxy_read_timeout              205;   
	        proxy_pass http://plone/VirtualHostBase/http/www.example.com:80/Plone/VirtualHostRoot/;

	        proxy_cache_bypass $cookie___ac;
	        proxy_cache thecache;
	        proxy_cache_key $scheme$proxy_host$request_uri;
	    }
    }
}

Running Plone 4b4 with Zope 2.13.0a1

by Nathan Van Gheem last modified Dec 28, 2010 05:31 AM
Just some guidelines on getting Plone 4b4 to work with the new Zope 2.13 release to save you some time.

Update

This method is most likely no longer suitable for a plone WSGI setup. With stable releases of Zope 2.13 coming out and Plone 4.1 almost ready for alpha, it'd be best to start there and ignore this post.

Introductions

I'm not going to go into detail of the wheres and hows everything is done. This post expects you to know a bit about plone, zope and buildout. Maybe I'll be more detailed later. Use this if you want to save yourself a lot of time in getting a working setup with Plone 4 and Zope 2.13. Zope 2.13 adds native WSGI support in Zope. I tested it a bit and seems to work well but results may vary and I'm sure there will be a more supported way to do this soon.

Extends

Make sure your buildout extends the http://download.zope.org/Zope2/index/2.13.0a1/versions.cfg versions file.

Checkouts

You'll need to checkout Products.CMFCore, Products.PluggableAuthService, Products.TinyMCE and plone.locking from svn. You can use these locations right now until there is a new release,

http://svn.zope.org/repos/main/Products.CMFCore/branches/2.2/ Products.CMFCore

http://svn.zope.org/repos/main/Products.PluggableAuthService/trunk Products.PluggableAuthService

http://svn.plone.org/svn/collective/Products.TinyMCE/trunk/ Products.TinyMCE

http://svn.plone.org/svn/plone/plone.locking/trunk/ plone.locking

You'll also need to add these packages to your develop buildout section.

Extra Versions Pins

You'll need to pin these versions since Zope 2.13 doesn't pin version that plone's setup used to assume were pinned,

Add these extra version pins

Products.CMFCore = Products.PluggableAuthService = Products.TinyMCE = plone.locking = five.formlib = 1.0.2 zope.formlib = 3.7.0 zope.app.apidoc = 3.6.2 zope.app.applicationcontrol = 3.5.0 zope.app.appsetup = 3.11 zope.app.authentication = 3.6.0 zope.app.basicskin = 3.4.1 zope.app.broken = 3.5.0 zope.app.cache = 3.6.0 zope.app.catalog = 3.8.0 zope.app.component = 3.8.3 zope.app.container = 3.8.0 zope.app.content = 3.4.0 zope.app.dav = 3.5.1 zope.app.debug = 3.4.1 zope.app.dependable = 3.4.0 zope.app.dtmlpage = 3.5.0 zope.app.error = 3.5.2 zope.app.exception = 3.5.0 zope.app.file = 3.5.0 zope.app.folder = 3.5.1 zope.app.form = 3.8.1 zope.app.generations = 3.5.0 zope.app.http = 3.6.0 zope.app.i18n = 3.6.1 zope.app.interface = 3.5.0 zope.app.intid = 3.7.0 zope.app.locales = 3.6.1 zope.app.localpermission = 3.7.2 zope.app.pagetemplate = 3.7.1 zope.app.principalannotation = 3.7.0 zope.app.publication = 3.8.1 zope.app.publisher = 3.8.4 zope.app.renderer = 3.5.1 zope.app.rotterdam = 3.5.0 zope.app.schema = 3.5.0 zope.app.security = 3.7.3 zope.app.securitypolicy = 3.5.1 zope.app.server = 3.4.2 zope.app.session = 3.6.1 zope.app.testing = 3.7.3 zope.app.traversing = 3.4.0 zope.app.undo = 3.5.0 zope.app.wsgi = 3.6.0 zope.app.zapi = 3.4.1 zope.app.zcmlfiles = 3.5.5 zope.app.zopeappgenerations = 3.5.0 zope.app.zptpage = 3.5.0 plone.app.form = 2.0b6 plone.app.contentrules = 2.0b4 plone.app.portlets = 2.0b11 plone.app.users = 1.0b9 plone.app.contentmenu = 2.0b3

Extra Eggs

You'll also need to add extra add dependencies to your buildout that the Plone 4b4 egg doesn't require and should.

    zope.formlib

    five.formlib

    zope.app.schema


WSGI

You can take a look at my previous post for doing WSGI Zope2 for guidelines on how to setup the paste config and such.

pysourcesearch

by Nathan Van Gheem last modified Jun 09, 2010 12:44 PM
A simple repoze.bfg application for searching python packages and sets of packages utilizing repoze.catalog.

Overview

Sometimes grep or TextMate's find can take a long time so I decided to give a mini search engine a try using repoze.bfg, repoze.catalog and Pygments. The result is that it works rather well and is very fast. The catalog ends up being rather large once you add groups of packages from Plone and such so this ends up making the RAM usage be very high unfortunately.

It gives you the ability to easily search methods, classes, filenames and full text and then give allows you to view the file with pygments.

You can take a look at search.nathanvangheem.com to see it working with repoze.bfg and Plone 4 being indexed.

You can checkout the pypi page if you're interested in it and I also have it on bitbucket.

Running Plone 4 with a Zope2 WSGI

by Nathan Van Gheem last modified Jun 04, 2010 01:50 PM
Guide to running Plone 4 with the Zope2 WSGI branch

Update

Tres has managed to merge his WSGI branch into trunk and Hanno tells me the unofficial plan is to include this in a release for Zope 2.13, in time for Plone 4.1. This is not decided upon yet though.

Overview

I was planning on implementing WSGI for Zope2 during the Penn State Symposium Sprints, and I did a few things to help out; however, Tres Seaver did most of the work on his own before and during part of the Sprint :) The rest of the time I spent just testing it out and helping with the Theme Editor sprint.

Now, it's really quite trivial to get it working now and makes all of the repoze.zope2 nonsense unneeded now. This article is just here for a reference if anyone else is interested in getting it going on their setup.

Guide

This guide assumes you have an existing Plone 4 installation to work from. I don't provide any buildouts here--just modifying an existing buildout to make it work with a branch of Zope2 and creating an ini that Paste can consume to serve WSGI.

Supplying the WSGI'd Zope2

First off, go to the src directory of the installation. If you've installed using the unified installer, that will bin in instance-home/zinstance/src or if you just used straight buildout, it'll be in instance/src. Then checkout the Zope2 branch:

svn co http://svn.zope.org/repos/main/Zope/branches/tseaver-fix_wsgi/ Zope2

Stringing up buildout

Next thing you'll need to do is modify your buildout.cfg file to add the checked out Zope 2 to the develop section:

develop = 
...
Zope2
...

Still modifying your buildout.cfg, add Paste, PasteScript, repoze.tm2 and repoze.retry to your eggs section:

eggs =
Plone
Paste
PasteScript
repoze.tm2
repoze.retry 

Again, editing your buildout.cfg, add a paster part for the paster script:

parts =
...
paster
...

[paster]
recipe = repoze.recipe.egg
scripts = paster
eggs = ${instance:eggs}

Then, you'll need to add the updated Zope2 versions for the WSGI branch. To do this, basically, just add the versions.cfg file provided in the branch after every other version file listed in the extends directive. It'll look like this:

extends =
...
src/Zope2/versions.cfg

Then run your buildout like normal:

./bin/buildout

Creating a WSGI Configuration File

You'll now need to create a WSGI configuration file. Right now, we'll just server it using the Paste server and wsgi ini configuration way. You can also do this to string up Apache's WSGI implementation but that is beyond the scope of this article.

Create a file in the instance directory called, zope2.ini with the contents of:

[app:zope]
use = egg:Zope2#main
zope_conf = %(here)s/parts/instance/etc/zope.conf

[pipeline:main]
pipeline =
    egg:paste#evalerror
    egg:repoze.retry#retry
    egg:repoze.tm2#tm
    zope

[server:main]
use = egg:paste#http
host = localhost
port = 8080

The zope_conf value in the app:zope section can be the path to any zope.conf file. I'm just exampling the standard location of it and not going through the configuration of that file itself.

Fire it up!

If all went well, you should now be able to start up your Plone 4 instance on WSGI like this:

./bin/paster serve zope2.ini

Your server should now be able to visit your site on http://localhost:8080

Caveats

I did run into a snag with the Mac OS X unified installer and the version of python it has configured. Basically, it wouldn't compile the Zope2 dependencies so I had to use my own version of python that I had compiled with the python buildout found in the plone collective svn. The bug is sort of referenced in the zope bug tracker.

Future Considerations

I'm hoping to maybe get a release with this branch implementation out--maybe as an alpha or beta release since I don't think they are planning on merging this to core any time soon; although, I really have no understanding of what that whole process is.

I'd like to see the Zope2 package implement mkzope2instance and other convenience methods so it'd be possible to install Plone 4/Zope2 without buildout at all maybe using pip. I'm looking into how this might be able to happen with a pip versions file and other things. Maybe more on this later.

 

Post any comments if you run into any issues.

Navigation