Wednesday, December 28, 2016

WSGI: The Server-Application Interface for Python

In 1993, the web was still in its infancy, with about 14 million users and a hundred websites. Pages were static but there was already a need to produce dynamic content, such as up-to-date news and data. Responding to this, Rob McCool and other contributors implemented the Common Gateway Interface (CGI) in the National Center for Supercomputing Applications (NCSA) HTTPd web server (the forerunner of Apache). This was the first web server that could serve content generated by a separate application.

Since then, the number of users on the Internet has exploded, and dynamic websites have become ubiquitous. When first learning a new language or even first learning to code, developers, soon enough, want to know about how to hook their code into the web.

Python on the Web and the Rise of WSGI

Since the creation of CGI, much has changed. The CGI approach became impractical, as it required the creation of a new process at each request, wasting memory and CPU. Some other low-level approaches emerged, like FastCGI](http://www.fastcgi.com/) (1996) and mod_python (2000), providing different interfaces between Python web frameworks and the web server. As different approaches proliferated, the developer’s choice of framework ended up restricting the choices of web servers and vice versa.

To address this problem, in 2003 Phillip J. Eby proposed PEP-0333, the Python Web Server Gateway Interface (WSGI). The idea was to provide a high-level, universal interface between Python applications and web servers.

In 2003, PEP-3333 updated the WSGI interface to add Python 3 support. Nowadays, almost all Python frameworks use WSGI as a means, if not the only means, to communicate with their web servers. This is how DjangoFlask and many other popular frameworks do it.

This article intends to provide the reader with a glimpse into how WSGI works, and allow the reader to build a simple WSGI application or server. It is not meant to be exhaustive, though, and developers intending to implement production-ready servers or applications should take a more thorough look into the WSGI specification.

The Python WSGI Interface

WSGI specifies simple rules that the server and application must conform to. Let’s start by reviewing this overall pattern.

The Python WSGI server-application interface.

Application Interface

In Python 3.5, the application interfaces goes like this:

def application(environ, start_response):
body = b'Hello world!\n'
status = '200 OK'
headers = [('Content-type', 'text/plain')]
start_response(status, headers)
return [body]

In Python 2.7, this interface wouldn’t be much different; the only change would be that the body is represented by a str object, instead of a bytes one.

Though we’ve used a function in this case, any callable will do. The rules for the application object here are:

  • Must be a callable with environ and start_response parameters.
  • Must call the start_response callback before sending the body.
  • Must return an iterable with pieces of the document body.

Another example of an object that satisfies these rules and would produce the same effect is:

class Application:
def __init__(self, environ, start_response):
self.environ = environ
self.start_response = start_response
def __iter__(self):
body = b'Hello world!\n'
status = '200 OK'
headers = [('Content-type', 'text/plain')]
self.start_response(status, headers)
yield body

Server Interface

A WSGI server might interface with this application like this::

def write(chunk):
\[code\]'Write data back to client\[/code\]'
...
def send_status(status):
\[code\]'Send HTTP status code\[/code\]'
...
def send_headers(headers):
\[code\]'Send HTTP headers\[/code\]'
...
def start_response(status, headers):
\[code\]'WSGI start_response callable\[/code\]'
send_status(status)
send_headers(headers)
return write
# Make request to application
response = application(environ, start_response)
try:
for chunk in response:
write(chunk)
finally:
if hasattr(response, 'close'):
response.close()

As you may have noticed, the start_response callable returned a write callable that the application may use to send data back to the client, but that was not used by our application code example. This write interface is deprecated, and we can ignore it for now. It will be briefly discussed later in the article.

Another peculiarity of the server’s responsibilities is to call the optional close method on the response iterator, if it exists. As pointed out in Graham Dumpleton’s article here, it is an often-overlooked feature of WSGI. Calling this method, if it exists, allows the application to release any resources that it may still hold.

The Application Callable’s environ Argument

The environ parameter should be a dictionary object. It is used to pass request and server information to the application, much in the same way CGI does. In fact, all CGI environment variables are valid in WSGI and the server should pass all that apply to the application.

While there are many optional keys that can be passed, several are mandatory. Taking as an example the following GET request:

$ curl 'http://localhost:8000/auth?user=obiwan&token=123'

These are the keys that the server must provide, and the values they would take:

KeyValueComments
REQUEST_METHOD "GET"
SCRIPT_NAME "" server setup dependent
PATH_INFO "/auth"
QUERY_STRING "token=123"
CONTENT_TYPE ""
CONTENT_LENGTH ""
SERVER_NAME "127.0.0.1" server setup dependent
SERVER_PORT "8000"
SERVER_PROTOCOL "HTTP/1.1"
HTTP_(...) Client supplied HTTP headers
wsgi.version (1, 0) tuple with WSGI version
wsgi.url_scheme "http"
wsgi.input File-like object
wsgi.errors File-like object
wsgi.multithread False True if server is multithreaded
wsgi.multiprocess False True if server runs multiple processes
wsgi.run_once False True if the server expects this script to run only once (e.g.: in a CGI environment)

The exception to this rule is that if one of these keys were to be empty (like CONTENT_TYPE in the above table), then they can be omitted from the dictionary, and it will be assumed they correspond to the empty string.

wsgi.input and wsgi.errors

Most environ keys are straightforward, but two of them deserve a little more clarification: wsgi.input, which must contain a stream with the request body from the client, and wsgi.errors, where the application reports any errors it encounters. Errors sent from the application to wsgi.errors typically would be sent to the server error log.

These two keys must contain file-like objects; that is, objects that provide interfaces to be read or written to as streams, just like the object we get when we open a file or a socket in Python. This may seem tricky at first, but fortunately, Python gives us good tools to handle this.

First, what kind of streams are we talking about? As per WSGI definition, wsgi.input and wsgi.errors must handle bytes objects in Python 3 and str objects in Python 2. In either case, if we’d like to use an in-memory buffer to pass or get data through the WSGI interface, we can use the class io.BytesIO.

As an example, if we are writing a WSGI server, we could provide the request body to the application like this:

  • For Python 2.7
import io
...
request_data = 'some request body'
environ['wsgi.input'] = io.BytesIO(request_data)

  • For Python 3.5
import io
...
request_data = 'some request body'.encode('utf-8') # bytes object
environ['wsgi.input'] = io.BytesIO(request_data)

On the application side, if we wanted to turn a stream input we’ve received into a string, we’d want to write something like this:

  • For Python 2.7
readstr = environ['wsgi.input'].read() # returns str object

  • For Python 3.5
readbytes = environ['wsgi

Add comment

Add comment

authimage