WSGI: The Server-Application Interface for Python

In 1993, the web was still in its infancy, with about 14 million users and a hundred websites. Pages were static but there was already a need to produce dynamic content, such as up-to-date news and data. Responding to this, Rob McCool and other contributors implemented the Common Gateway Interface (CGI) in the National Center for Supercomputing Applications (NCSA) HTTPd web server (the forerunner of Apache). This was the first web server that could serve content generated by a separate application.

Since then, the number of users on the Internet has exploded, and dynamic websites have become ubiquitous. When first learning a new language or even first learning to code, developers, soon enough, want to know about how to hook their code into the web.

Python on the Web and the Rise of WSGI

Since the creation of CGI, much has changed. The CGI approach became impractical, as it required the creation of a new process at each request, wasting memory and CPU. Some other low-level approaches emerged, like FastCGI](http://www.fastcgi.com/) (1996) and mod_python (2000), providing different interfaces between Python web frameworks and the web server. As different approaches proliferated, the developer’s choice of framework ended up restricting the choices of web servers and vice versa.

To address this problem, in 2003 Phillip J. Eby proposed PEP-0333, the Python Web Server Gateway Interface (WSGI). The idea was to provide a high-level, universal interface between Python applications and web servers.

In 2003, PEP-3333 updated the WSGI interface to add Python 3 support. Nowadays, almost all Python frameworks use WSGI as a means, if not the only means, to communicate with their web servers. This is how DjangoFlask and many other popular frameworks do it.

This article intends to provide the reader with a glimpse into how WSGI works, and allow the reader to build a simple WSGI application or server. It is not meant to be exhaustive, though, and developers intending to implement production-ready servers or applications should take a more thorough look into the WSGI specification.

The Python WSGI Interface

WSGI specifies simple rules that the server and application must conform to. Let’s start by reviewing this overall pattern.

The Python WSGI server-application interface.

Application Interface

In Python 3.5, the application interfaces goes like this:

def application(environ, start_response):
body = b'Hello world!\n'
status = '200 OK'
headers = [('Content-type', 'text/plain')]
start_response(status, headers)
return [body]

In Python 2.7, this interface wouldn’t be much different; the only change would be that the body is represented by a str object, instead of a bytes one.

Though we’ve used a function in this case, any callable will do. The rules for the application object here are:

  • Must be a callable with environ and start_response parameters.
  • Must call the start_response callback before sending the body.
  • Must return an iterable with pieces of the document body.

Another example of an object that satisfies these rules and would produce the same effect is:

class Application:
def __init__(self, environ, start_response):
self.environ = environ
self.start_response = start_response
def __iter__(self):
body = b'Hello world!\n'
status = '200 OK'
headers = [('Content-type', 'text/plain')]
self.start_response(status, headers)
yield body

Server Interface

A WSGI server might interface with this application like this::

def write(chunk):
\[code\]'Write data back to client\[/code\]'
...
def send_status(status):
\[code\]'Send HTTP status code\[/code\]'
...
def send_headers(headers):
\[code\]'Send HTTP headers\[/code\]'
...
def start_response(status, headers):
\[code\]'WSGI start_response callable\[/code\]'
send_status(status)
send_headers(headers)
return write
# Make request to application
response = application(environ, start_response)
try:
for chunk in response:
write(chunk)
finally:
if hasattr(response, 'close'):
response.close()

As you may have noticed, the start_response callable returned a write callable that the application may use to send data back to the client, but that was not used by our application code example. This write interface is deprecated, and we can ignore it for now. It will be briefly discussed later in the article.

Another peculiarity of the server’s responsibilities is to call the optional close method on the response iterator, if it exists. As pointed out in Graham Dumpleton’s article here, it is an often-overlooked feature of WSGI. Calling this method, if it exists, allows the application to release any resources that it may still hold.

The Application Callable’s environ Argument

The environ parameter should be a dictionary object. It is used to pass request and server information to the application, much in the same way CGI does. In fact, all CGI environment variables are valid in WSGI and the server should pass all that apply to the application.

While there are many optional keys that can be passed, several are mandatory. Taking as an example the following GET request:

$ curl 'http://localhost:8000/auth?user=obiwan&token=123'

These are the keys that the server must provide, and the values they would take:

KeyValueComments
REQUEST_METHOD "GET"
SCRIPT_NAME "" server setup dependent
PATH_INFO "/auth"
QUERY_STRING "token=123"
CONTENT_TYPE ""
CONTENT_LENGTH ""
SERVER_NAME "127.0.0.1" server setup dependent
SERVER_PORT "8000"
SERVER_PROTOCOL "HTTP/1.1"
HTTP_(...) Client supplied HTTP headers
wsgi.version (1, 0) tuple with WSGI version
wsgi.url_scheme "http"
wsgi.input File-like object
wsgi.errors File-like object
wsgi.multithread False True if server is multithreaded
wsgi.multiprocess False True if server runs multiple processes
wsgi.run_once False True if the server expects this script to run only once (e.g.: in a CGI environment)

The exception to this rule is that if one of these keys were to be empty (like CONTENT_TYPE in the above table), then they can be omitted from the dictionary, and it will be assumed they correspond to the empty string.

wsgi.input and wsgi.errors

Most environ keys are straightforward, but two of them deserve a little more clarification: wsgi.input, which must contain a stream with the request body from the client, and wsgi.errors, where the application reports any errors it encounters. Errors sent from the application to wsgi.errors typically would be sent to the server error log.

These two keys must contain file-like objects; that is, objects that provide interfaces to be read or written to as streams, just like the object we get when we open a file or a socket in Python. This may seem tricky at first, but fortunately, Python gives us good tools to handle this.

First, what kind of streams are we talking about? As per WSGI definition, wsgi.input and wsgi.errors must handle bytes objects in Python 3 and str objects in Python 2. In either case, if we’d like to use an in-memory buffer to pass or get data through the WSGI interface, we can use the class io.BytesIO.

As an example, if we are writing a WSGI server, we could provide the request body to the application like this:

  • For Python 2.7
import io
...
request_data = 'some request body'
environ['wsgi.input'] = io.BytesIO(request_data)

  • For Python 3.5
import io
...
request_data = 'some request body'.encode('utf-8') # bytes object
environ['wsgi.input'] = io.BytesIO(request_data)

On the application side, if we wanted to turn a stream input we’ve received into a string, we’d want to write something like this:

  • For Python 2.7
readstr = environ['wsgi.input'].read() # returns str object

  • For Python 3.5
readbytes = environ['wsgi

Control Your Laptop with an Android Phone using Python, Twisted, and Django

Introduction

It’s always fun to put your Android or Python programming skills on display. A while back, I figured it’d be cool to try and control my laptop via my Android mobile device. Think about it: remote laptop access including being able to play and pause music, start and stop programming jobs or downloads, etc., all by sending messages from your phone. Neat, huh?

Before you keep on reading, please bear in mind that this is a pet project, still in its early stages—but the basic platform is there. By gluing together some mainstream tools, I was able to setup my Android phone to control my laptop via a Python interpreter.

By the way: the project is open source. You can check out the client code here, and the server code here.

The Remote Laptop Access Tool Belt: Python, Twisted, Django, and Amarok

This project involves the following technologies, some of which you may be familiar with, some of which are quite specific to the task at-hand:

  • Python 2.7+
  • Twisted: an excellent event-driven framework especially crafted for network hackers.
  • Django: I used v1.4, so you’ll have to adjust the location of some files if you want to run a lower version.
  • Amarok: a D-BUS (more on this below) manageable media player. This could be subbed out for other such media players (ClementineVLC, or anything that supports MPRIS) if you know their messaging structures. I chose Amarok because it comes with my KDE distribution by default. Plus, it’s fast and easily configurable.
  • An Android phone with Python for Android installed (more on this below). The process is pretty straightforward—even for Py3k!
  • Remote Amarok and Remote Amarok Web.

At a High Level

At a high level, we consider our Android phone to be the client and our laptop, the server. I’ll go through this remote access architecture in-depth below, but the basic flow of the project is as follows:

  1. The user types some command into the Python interpreter.
  2. The command is sent to the Django instance.
  3. Django then passes the command along to Twisted.
  4. Twisted then parses the command sends a new command via D-Bus to Amarok.
  5. Amarok interacts with the actual laptop, controlling the playing/pausing of music.

Using this toolbelt, learn how to control a laptop with Python, Twisted, and Django.

Now, lets dig in.

Python on Android

So one good day, I started looking at Python interpreters that I could run on my Android phone (Droid 2, back then). Soon after, I discovered the excellent SL4A package that brought Python For Android to life. It’s a really nifty package: you click a couple buttons and suddenly you have an almost fully functional Python environment on your mobile or tablet device that can both run your good ol’ Python code and access the Android API (I say almost because some stuff probably is missing and the Android API isn’t 100% accessible, but for most use-cases, it’s sufficient).

If you prefer, you can also build your own Python distribution to run on your Android device, which has the advantage that you can then run any version of the interpreter you desire. The process involves cross-compiling Python to be run on ARM (the architecture used on Android devices and other tablets). It’s not easy, but it’s certainly doable. If you’re up for the challenge, check here or here.

Once you have your interpreter setup, you can do basically whatever you like by combining Python with the Android API, including controlling your laptop remotely. For example, you can:

  • Send and read SMS.
  • Interact with third-party APIs around the Internet via urllib and other libraries.
  • Display native look and feel prompts, spinning dialogs, and the like.
  • Change your ringtone.
  • Play music or videos.
  • Interact with Bluetooth—this one in particular paves the way for a lot of opportunities. For example, I once played around with using my phone as a locker-unlocker application for my laptop (e.g., unlock my laptop via Bluetooth when my phone was nearby).

How Using Your Phone to Control Your Laptop Works

The Architecture

Our project composition is as follows:

  • A client-side application built on Twisted if you want to test the server code (below) without having to run the Django application at all.

  • A server-side Django application, which reads in commands from the Android device and passes them along to Twisted. As it stands, Amarok is the only laptop application that the server can interact with (i.e., to control music), but that’s a sufficient proof-of-concept, as the platform is easily extensible.

  • A server-side Twisted ‘instance’ which communicates with the laptop’s media player via D-Bus, sending along commands as they come in from Django (currently, I support ‘next’, ‘previous’, ‘play’, ‘pause’, ‘stop’, and ‘mute’). Why not just pass the commands directly from Django to Amarok? Twisted’s event-driven, non-blocking attributes take away all the hard work of threading (more below). If you’re interested in marrying the two, see here.

Twisted is excellent, event-driven, and versatile. It operates using a callback system, deferred objects, and some other techniques. I’d definitely recommend that you try it out: the amount of work that you avoid by using Twisted is seriously impressive. For example, it serves boilerplate code for lots of protocol, including IRC, HTTP, SSH, etc. without having to deal with non-blocking mechanisms (threads, select, etc.).
  • The client-side Android code, uploaded to your device with a customized URL to reach your Django application. It’s worth mentioning that this particular piece of code runs on Python 2.7+, including Py3k.

What’s D-Bus?

I’ve mentioned D-Bus several times, so it’s probably worth discussing it in more detail. Broadly speaking, D-Bus is a messaging bus system for communicating between applications (e.g., on a laptop computer and Android phone) easily through specially crafted messages.

It’s mainly composed of two buses: the system bus, for system-wide stuff; and the session bus, for userland stuff. Typical messages to the system bus would be “Hey, I’ve added a new printer, notify my D-Bus enabled applications that a new printer is online”, while typical Inter-Process Communication (IPC) among applications would go to the session bus.

We use the session bus to communicate with Amarok. It’s very likely that most modern applications (under Linux environments, at least) will support this type of messaging and generally all the commands/functions that they can process are well documented. As any application with D-Bus support can be controlled under this architecture, the possibilities are nearly endless.

More info can be found here.

Behind the Scenes:

Having set up all the infrastructure, you can fire off the Android application and it will enter into an infinite loop to read incoming messages, process them with some sanity checks, and, if valid, send them to a predefined URL (i.e., the URL of your Django app), which will in-turn process the input and act accordingly. The Android client then marks the message as read and the loop continues until a message with the exact contents “exitclient” (clever, huh?) is processed, in which case the client will exit.

On the server, the Django application picks up a command to-be processed and checks if it starts with a valid instruction. If so, it connects to the Twisted server (using Telnetlib to connect via telnet) and sends the command along. Finally, Twisted parses the input, transforms it into something suitable for Amarok, and lets Amarok do its the magic! Finally, your laptop responds by playing songs, pausing, skipping, etc.

Regarding the “predefined URL”: if you want to be controlling your computer from afar, this will have to be a public URL (reachable over the Internet). Be aware that, currently, the code doesn’t implement any layer of security (SSL, etc.)—such improvements are exercises for the reader, at the moment.

What Else Can I Do With This?

Everything looks really simple so far, huh? You may be asking yourself: “Can this be extended to support nifty feature [X]?” The answer is: Yes (probably)! Given that you know how to interact with your computer using your phone properly, you can supplement the server-side code to do whatever you like. Before you know it, you’ll be shooting off lengthy processes on your computer remotely. Or, if you can cope with the electronics, you could build an interface between your computer and your favorite appliance, controlling that via SMS instructions (“Make me coffee!” comes to mind). 

 

This article originally appeared on Toptal

 

The Vital Guide to Python Interviewing

The Challenge

As a rough order of magnitude, Giles Thomas (co-founder of PythonAnywhere) estimates that there are between 1.8 and 4.3 million Python developers in the world.

So how hard can it be to find a Python developer? Well, not very hard at all if the goal is just to find someone who can legitimately list Python on their resume. But if the goal is to find a Python guru who has truly mastered the nuances and power of the language, then the challenge is most certainly a formidable one.

First and foremost, a highly-effective recruiting process is needed, as described in our post In Search of the Elite Few – Finding and Hiring the Best Developers in the Industry. Such a process can then be augmented with targeted questions and techniques, such as those provided here, that are specifically geared toward ferreting out Python virtuosos from the plethora of some-level-of-Python-experience candidates.

Python Guru or Snake in the Grass?

So you’ve found what appears to be a strong Python developer. How do you determine if he or she is, in fact, in the elite top 1% of candidates that you’re looking to hire? While there’s no magic or foolproof technique, there are certainly questions you can pose that will help determine the depth and sophistication of a candidate’s knowledge of the language. A brief sampling of such questions is provided below.

It is important to bear in mind, though, that these sample questions are intended merely as a guide. Not every “A” candidate worth hiring will be able to properly answer them all, nor does answering them all guarantee an “A” candidate. At the end of the day, hiring remains as much of an art as it does a science.

Python in the Weeds…

While it’s true that the best developers don’t waste time committing to memory that which can easily be found in a language specification or API document, there are certain key features and capabilities of any programming language that any expert can, and should, be expected to be well-versed in. Here are some Python-specific examples:

Q: Why use function decorators? Give an example.

A decorator is essentially a callable Python object that is used to modify or extend a function or class definition. One of the beauties of decorators is that a single decorator definition can be applied to multiple functions (or classes). Much can thereby be accomplished with decorators that would otherwise require lots of boilerplate (or even worse redundant!) code. Flask, for example, uses decorators as the mechanism for adding new endpoints to a web application. Examples of some of the more common uses of decorators include adding synchronization, type enforcement, logging, or pre/post conditions to a class or function.

Q: What are lambda expressions, list comprehensions and generator expressions? What are the advantages and appropriate uses of each?

Lambda expressions are a shorthand technique for creating single line, anonymous functions. Their simple, inline nature often – though not always – leads to more readable and concise code than the alternative of formal function declarations. On the other hand, their terse inline nature, by definition, very much limits what they are capable of doing and their applicability. Being anonymous and inline, the only way to use the same lambda function in multiple locations in your code is to specify it redundantly.

List comprehensions provide a concise syntax for creating lists. List comprehensions are commonly used to make lists where each element is the result of some operation(s) applied to each member of another sequence or iterable. They can also be used to create a subsequence of those elements whose members satisfy a certain condition. In Python, list comprehensions provide an alternative to using the built-in map()and filter() functions.

As the applied usage of lambda expressions and list comprehensions can overlap, opinions vary widely as to when and where to use one vs. the other. One point to bear in mind, though, is that a list comprehension executes somewhat faster than a comparable solution using map and lambda (some quick tests yielded a performance difference of roughly 10%). This is because calling a lambda function creates a new stack frame while the expression in the list comprehension is evaluated without doing so.

Generator expressions are syntactically and functionally similar to list comprehensions but there are some fairly significant differences between the ways the two operate and, accordingly, when each should be used. In a nutshell, iterating over a generator expression or list comprehension will essentially do the same thing, but the list comprehension will create the entire list in memory first while the generator expression will create the items on the fly as needed. Generator expressions can therefore be used for very large (and even infinite) sequences and their lazy (i.e., on demand) generation of values results in improved performance and lower memory usage. It is worth noting, though, that the standard Python list methods can be used on the result of a list comprehension, but not directly on that of a generator expression.

Q: Consider the two approaches below for initializing an array and the arrays that will result. How will the resulting arrays differ and why should you use one initialization approach vs. the other?

>>> # INITIALIZING AN ARRAY -- METHOD 1
...
>>> x = [[1,2,3,4]] * 3
>>> x
[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]
>>>
>>>
>>> # INITIALIZING AN ARRAY -- METHOD 2
...
>>> y = [[1,2,3,4] for _ in range(3)]
>>> y
[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]
>>>
>>> # WHICH METHOD SHOULD YOU USE AND WHY?

While both methods appear at first blush to produce the same result, there is an extremely significant difference between the two. Method 2 produces, as you would expect, an array of 3 elements, each of which is itself an independent 4-element array. In method 1, however, the members of the array all point to the same object. This can lead to what is most likely unanticipated and undesired behavior as shown below.

>>> # MODIFYING THE x ARRAY FROM THE PRIOR CODE SNIPPET:
>>> x[0][3] = 99
>>> x
[[1, 2, 3, 99], [1, 2, 3, 99], [1, 2, 3, 99]]
>>> # UH-OH, DON’T THINK YOU WANTED THAT TO HAPPEN!
...
>>>
>>> # MODIFYING THE y ARRAY FROM THE PRIOR CODE SNIPPET:
>>> y[0][3] = 99
>>> y
[[1, 2, 3, 99], [1, 2, 3, 4], [1, 2, 3, 4]]
>>> # THAT’S MORE LIKE WHAT YOU EXPECTED!
...

Q: What will be printed out by the second append() statement below?

>>> def append(list=[]):
...     # append the length of a list to the list
...     list.append(len(list))
...     return list
...
>>> append(['a','b'])
['a', 'b', 2]
>>>
>>> append()  # calling with no arg uses default list value of []
[0]
>>>
>>> append()  # but what happens when we AGAIN call append with no arg?

When the default value for a function argument is an expression, the expression is evaluated only once, not every time the function is called. Thus, once the list argument has been initialized to an empty array, subsequent calls to append without any argument specified will continue to use the same array to which list was originally initialized. This will therefore yield the following, presumably unexpected, behavior:

>>> append()  # first call with no arg uses default list value of []
[0]
>>> append()  # but then look what happens...
[0, 1]
>>> append()  # successive calls keep extending the same default list!
[0, 1, 2]
>>> append()  # and so on, and so on, and so on...
[0, 1, 2, 3]

Q: How might one modify the implementation of the ‘append’ method in the previous question to avoid the undesirable behavior described there?

The following alternative implementation of the append method would be one of a number of ways to avoid the undesirable behavior described in the answer to the previous question:

>>> def append(list=None):
...     if list is None:
list = []
# append the length of a list to the list
...     list.append(len(list))
...     return list
...
>>> append()
[0]
>>> append()
[0]

Q: How can you swap the values of two variables with a single line of Python code?

Consider this simple example:

>>> x = 'X'
>>> y = 'Y'

In many other languages, swapping the values of x and y requires that you to do the following:

>>> tmp = x
>>> x = y
>>> y = tmp
>>> x, y
('Y', 'X')

But in Python, makes it possible to do the swap with a single line of code (thanks to implicit tuple packing and unpacking) as follows:

>>> x,y = y,x
>>> x,y
('Y', 'X')

Q: What will be printed out by the last statement below?

>>> flist = []
>>> for i in range(3):
...     flist.append(lambda: i)
...
>>> [f() for f in flist]   # what will this print out?

In any closure in Python, variables are bound by name. Thus, the above line of code will print out the following:

[2, 2, 2]

Presumably not what the author of the above code intended!

workaround is to either create a separate function or to pass the args by name; e.g.:

>>> flist = []
>>> for i in range(3):
...     flist.append(lambda i = i : i)
...
>>> [f() for f in flist]
[0, 1, 2]

Q: What are the key differences between Python 2 and 3?

Although Python 2 is formally considered legacy at this point, its use is still widespread enough that is important for a developer to recognize the differences between Python 2 and 3.

Here are some of the key differences that a developer should be aware of:

  • Text and Data instead of Unicode and 8-bit strings. Python 3.0 uses the concepts of text and (binary) data instead of Unicode strings and 8-bit strings. The biggest ramification of this is that any attempt to mix text and data in Python 3.0 raises a TypeError (to combine the two safely, you must decode bytes or encode Unicode, but you need to know the proper encoding, e.g. UTF-8)
    • This addresses a longstanding pitfall for naïve Python programmers. In Python 2, mixing Unicode and 8-bit data would work if the string happened to contain only 7-bit (ASCII) bytes, but you would get UnicodeDecodeError if it contained non-ASCII values. Moreover, the exception would happen at the combination point, not at the point at which the non-ASCII characters were put into the str object. This behavior was a common source of confusion and consternation for neophyte Python programmers.
  • print function. The print statement has been replaced with a print() function
  • xrange – buh-bye. xrange() no longer exists (range() now behaves like xrange() used to behave, except it works with values of arbitrary size)
  • API changes:
    • zip()map() and filter() all now return iterators instead of lists
    • dict.keys()dict.items() and dict.values() now return “views” instead of lists
    • dict.iterkeys()dict.iteritems() and dict.itervalues() are no longer supported
  • Comparison operators. The ordering comparison operators (<<=>=>) now raise a TypeErrorexception when the operands don’t have a meaningful natural ordering. Some examples of the ramifications of this include:
    • Expressions like 1 < ''0 > None or len <= len are no longer valid
    • None < None now raises a TypeError instead of returning False
    • Sorting a heterogeneous list no longer makes sense – all the elements must be comparable to each other

More details on the differences between Python 2 and 3 are available here.

Q: Is Python interpreted or compiled?

As noted in Why Are There So Many Pythons?, this is, frankly, a bit of a trick question in that it is malformed. Python itself is nothing more than an interface definition (as is true with any language specification) of which there are multiple implementations. Accordingly, the question of whether “Python” is interpreted or compiled does not apply to the Python language itself; rather, it applies to each specific implementation of the Python specification.

Further complicating the answer to this question is the fact that, in the case of CPython (the most common Python implementation), the answer really is “sort of both”. Specifically, with CPython, code is first compiled and then interpreted. More precisely, it is not precompiled to native machine code, but rather to bytecode. While machine code is certainly faster, bytecode is more portable and secure. The bytecode is then interpreted in the case of CPython (or both interpreted and compiled to optimized machine code at runtime in the case of PyPy).

Q: What are some alternative implementations to CPython? When and why might you use them?

One of the more prominent alternative implementations is Jython, a Python implementation written in Java that utilizes the Java Virtual Machine (JVM). While CPython produces bytecode to run on the CPython VM, Jython produces Java bytecode to run on the JVM.

Another is IronPython, written in C# and targeting the .NET stack. IronPython runs on Microsoft’s Common Language Runtime (CLR).

As also pointed out in Why Are There So Many Pythons?, it is entirely possible to survive without ever touching a non-CPython implementation of Python, but there are advantages to be had from switching, most of which are dependent on your technology stack.

Another noteworthy alternative implementation is PyPy whose key features include:

  • Speed. Thanks to its Just-in-Time (JIT) compiler, Python programs often run faster on PyPy.
  • Memory usage. Large, memory-hungry Python programs might end up taking less space with PyPy than they do in CPython.
  • Compatibility. PyPy is highly compatible with existing python code. It supports cffi and can run popular Python libraries like Twisted and Django.
  • Sandboxing. PyPy provides the ability to run untrusted code in a fully secure way.
  • Stackless mode. PyPy comes by default with support for stackless mode, providing micro-threads for massive concurrency.

Q: What’s your approach to unit testing in Python?

The most fundamental answer to this question centers around Python’s unittest testing framework. Basically, if a candidate doesn’t mention unittest when answering this question, that should be a huge red flag.

unittest supports test automation, sharing of setup and shutdown code for tests, aggregation of tests into collections, and independence of the tests from the reporting framework. The unittest module provides classes that make it easy to support these qualities for a set of tests.

Assuming that the candidate does mention unittest (if they don’t, you may just want to end the interview right then and there!), you should also ask them to describe the key elements of the unittest framework; namely, test fixtures, test cases, test suites and test runners.

A more recent addition to the unittest framework is mock. mock allows you to replace parts of your system under test with mock objects and make assertions about how they are to be used. mock is now part of the Python standard library, available as unittest.mock in Python 3.3 onwards.

The value and power of mock are well explained in An Introduction to Mocking in Python. As noted therein, system calls are prime candidates for mocking: whether writing a script to eject a CD drive, a web server which removes antiquated cache files from /tmp, or a socket server which binds to a TCP port, these calls all feature undesired side-effects in the context of unit tests. Similarly, keeping your unit-tests efficient and performant means keeping as much “slow code” as possible out of the automated test runs, namely filesystem and network access.

[Note: This question is for Python developers who are also experienced in Java.]
Q: What are some key differences to bear in mind when coding in Python vs. Java?

Disclaimer #1. The differences between Java and Python are numerous and would likely be a topic worthy of its own (lengthy) post. Below is just a brief sampling of some key differences between the two languages.

Disclaimer #2. The intent here is not to launch into a religious battle over the merits of Python vs. Java (as much fun as that might be!). Rather, the question is really just geared at seeing how well the developer understands some practical differences between the two languages. The list below therefore deliberately avoids discussing the arguable advantages of Python over Java from a programming productivity perspective.

With the above two disclaimers in mind, here is a sampling of some key differences to bear in mind when coding in Python vs. Java:

  • Dynamic vs static typing. One of the biggest differences between the two languages is that Java is restricted to static typing whereas Python supports dynamic typing of variables.
  • Static vs. class methods. A static method in Java does not translate to a Python class method.
    • In Python, calling a class method involves an additional memory allocation that calling a static method or function does not.
    • In Java, dotted names (e.g., foo.bar.method) are looked up by the compiler, so at runtime it really doesn’t matter how many of them you have. In Python, however, the lookups occur at runtime, so “each dot counts”.
  • Method overloading. Whereas Java requires explicit specification of multiple same-named functions with different signatures, the same can be accomplished in Python with a single function that includes optional arguments with default values if not specified by the caller.
  • Single vs. double quotes. Whereas the use of single quotes vs. double quotes has significance in Java, they can be used interchangeably in Python (but no, it won’t allow beginnning the same string with a double quote and trying to end it with a single quote, or vice versa!).
  • Getters and setters (not!). Getters and setters in Python are superfluous; rather, you should use the ‘property’ built-in (that’s what it’s for!). In Python, getters and setters are a waste of both CPU and programmer time.
  • Classes are optional. Whereas Java requires every function to be defined in the context of an enclosing class definition, Python has no such requirement.
  • Indentation matters… in Python. This bites many a newbie Python programmer.

The Big Picture

An expert knowledge of Python extends well beyond the technical minutia of the language. A Python expert will have an in-depth understanding and appreciation of Python’s benefits as well as its limitations. Accordingly, here are some sample questions that can help assess this dimension of a candidate’s expertise:

Q: What is Python particularly good for? When is using Python the “right choice” for a project?

Although likes and dislikes are highly personal, a developer who is “worth his or her salt” will highlight features of the Python language that are generally considered advantageous (which also helps answer the question of what Python is “particularly good for”). Some of the more common valid answers to this question include:

  • Ease of use and ease of refactoring, thanks to the flexibility of Python’s syntax, which makes it especially useful for rapid prototyping.
  • More compact code, thanks again to Python’s syntax, along with a wealth of functionally-rich Python libraries (distributed freely with most Python language implementations).
This article originally appeared on Toptal

Service Oriented Architecture with AWS Lambda: A Step-by-Step Tutorial

When building web applications, there are many choices to be made that can either help or hinder your application in the future once you commit to them. Choices such as language, framework, hosting, and database are crucial.

One such choice is whether to create a service-based application using Service Oriented Architecture (SOA) or a traditional, monolithic application. This is a common architectural decision affecting startups, scale-ups, and enterprise companies alike.

Service Oriented Architecture is used by a large number of well-known unicorns and top-tech companies such as Google, Facebook, Twitter, Instagram and Uber. Seemingly, this architecture pattern works for large companies, but can it work for you?

Service Oriented Architecture with AWS Lambda: A Step-By-Step Tutorial

Service Oriented Architecture with AWS Lambda: A Step-By-Step Tutorial

In this article we will introduce the topic of Service Oriented architecture, and how AWS Lambda in combination with Python can be leveraged to easily build scalable, cost-efficient services. To demonstrate these ideas, we will build a simple image uploading and resizing service using Python, AWS Lambda, Amazon S3 and a few other relevant tools and services.

What is Service Oriented Architecture?

Service Oriented Architecture (SOA) isn’t new, having roots from several decades ago. In recent years its popularity as a pattern has been growing due to offering many benefits for web-facing applications.

SOA is, in essence, the abstraction of one large application into many communicating smaller applications. This follows several best practices of software engineering such as de-coupling, separation of concerns and single-responsibility architecture.

Implementations of SOA vary in terms of granularity: from very few services that cover large areas of functionality to many dozens or hundreds of small applications in what is termed “microservice” architecture. Regardless of the level of granularity, what is generally agreed amongst practitioners of SOA is that it is by no means a free lunch. Like many good practices in software engineering, it is an investment that will require extra planning, development and testing.

What is AWS Lambda?

AWS Lambda is a service offered by the Amazon Web Services platform. AWS Lambda allows you to upload code that will be run on an on-demand container managed by Amazon. AWS Lambda will manage the provisioning and managing of servers to run the code, so all that is needed from the user is a packaged set of code to run and a few configuration options to define the context in which the server runs. These managed applications are referred to as Lambda functions.

AWS Lambda has two main modes of operation:

Asynchronous / Event-Driven:

Lambda functions can be run in response to an event in asynchronous mode. Any source of events, such as S3, SNS, etc. will not block and Lambda functions can take advantage of this in many ways, such as establishing a processing pipeline for some chain of events. There are many sources of information, and depending on the source events will be pushed to a Lambda function from the event source, or polled for events by AWS Lambda.

Synchronous / Request->Response:

For applications that require a response to be returned synchronously, Lambda can be run in synchronous mode. Typically this is used in conjunction with a service called API Gateway to return HTTP responses from AWS Lambda to an end-user, however Lambda functions can also be called synchronously via a direct call to AWS Lambda.

AWS Lambda functions are uploaded as a zip file containing handler code in addition to any dependencies required for the operation of the handler. Once uploaded, AWS Lambda will execute this code when needed and scale the number of servers from zero to thousands when required, without any extra intervention required by the consumer.

Lambda Functions as an Evolution of SOA

Basic SOA is a way to structure your code-base into small applications in order to benefit an application in the ways described earlier in this article. Arising from this, the method of communication between these applications comes into focus. Event-driven SOA (aka SOA 2.0) allows for not only the traditional direct service-to-service communication of SOA 1.0, but also for events to be propagated throughout the architecture in order to communicate change.

Event-driven architecture is a pattern that naturally promotes loose coupling and composability. By creating and reacting to events, services can be added ad-hoc to add new functionality to an existing event, and several events can be composed to provide richer functionality.

AWS Lambda can be used as a platform to easily build SOA 2.0 applications. There are many ways to trigger a Lambda function; from the traditional message-queue approach with Amazon SNS, to events created by a file being uploaded to Amazon S3, or an email being sent with Amazon SES.

Implementing a Simple Image Uploading Service

We will be building a simple application to upload and retrieve images utilizing the AWS stack. This example project will contain two lambda functions: one running in request->response mode that will be used to serve our simple web frontend, and another that will detect uploaded images and resize them.

The first lambda function will run asynchronously in response to a file-upload event triggered on the S3 bucket that will house the uploaded images. It will take the image provided and resize it to fit within a 400x400 image.

The other lambda function will serve the HTML page, providing both the functionality for a user to view the images resized by our other Lambda function as well as an interface for uploading an image.

Initial AWS Configuration

Before we can begin, we will need to configure some necessary AWS services such as IAM and S3. These will be configured using the web-based AWS console. However, most of the configuration can also be achieved by using the AWS command-line utility, which we will use later.

Creating S3 Buckets

S3 (or Simple Storage Service) is an Amazon object-store service that offers reliable and cost-efficient storage of any data. We will be using S3 to store the images that will be uploaded, as well as the resized versions of the images we have processed.

The S3 service can be found under the “Services” drop-down in the AWS console under the “Storage & Content Delivery” sub-section. When creating a bucket you will be prompted to enter both the bucket name as well as to select a region. Selecting a region close to your users will allow S3 to optimize for latency and cost, as well as some regulatory factors. For this example we will select the “US Standard” region. This same region will later be used for hosting the AWS Lambda functions.

It is worth noting that S3 bucket names are required to be unique, so if the name chosen is taken you will be required to choose a new, unique name.

For this example project, we will create two storage buckets named “test-upload” and “test-resized”. The “test-upload” bucket will be used for uploading images and storing the uploaded image before it is processed and resized. Once resized, the image will be saved into the “test-resized” bucket, and the raw uploaded image removed.

S3 Upload Permissions

By default, S3 Permissions are restrictive and will not allow external users or even non-administrative users to read, write, update, or delete any permissions or objects on the bucket. In order to change this, we will need to be logged in as a user with the rights to manage AWS bucket permissions.

Assuming we are on the AWS console, we can view the permissions for our upload bucket by selecting the bucket by name, clicking on the “Properties” button in the top-right of the screen, and opening the collapsed “Permissions” section.

In order to allow anonymous users to upload to this bucket, we will need to edit the bucket policy to allow the specific permission that allows upload to be allowed. This is accomplished through a JSON-based configuration policy. These kind of JSON policies are used widely throughout AWS in conjunction with the IAM service. Upon clicking on the “Edit Bucket Policy” button, simply paste the following text and click “Save” to allow public image uploads:

{
"Version": "2008-10-17",
"Id": "Policy1346097257207",
"Statement": [
{
"Sid": "Allow anonymous upload to /",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::test-upload/*"
}
]
}

After doing this, we can verify the bucket policy is correct by attempting to upload an image to the bucket. The following cURL command will do the trick:

curl https://test-upload.s3.amazonaws.com -F 'key=test.jpeg' -F 'file=@test.jpeg'

If a 200-range response is returned, we will know that the configuration for the upload bucket has been successfully applied. Our S3 buckets should now be (mostly) configured. We will return later to this service in the console in order to connect our image upload events to the invocation of our resize function.

IAM Permissions for Lambda

Lambda roles all run within a permission context, in this case a “role” defined by the IAM service. This role defines any and all permissions that the Lambda function has during its invocation. For the purposes of this example project, we will create a generic role that will be used between both of the Lambda functions. However, in a production scenario finer granularity in permission definitions is recommended to ensure that any security exploitations are isolated to only the permission context that was defined.

The IAM service can be found within the “Security & Identity” sub-section of the “Services” drop-down. The IAM service is a very powerful tool for managing access across AWS services, and the interface provided may be a bit over-whelming at first if you are not familiar with similar tools.

Once on the IAM dashboard page, the “Roles” sub-section can be found on the left-hand side of the page. From here we can use the “Create New Role” button to bring up a multi-step wizard to define the permissions of the role. Let’s use “lambda_role” as the name of our generic permission. After continuing from the name definition page, you will be presented with the option to select a role type. As we only require S3 access, click on “AWS Service Roles” and within the selection box select “AWS Lambda”. You will be presented with a page of policies that can be attached to this role. Select the “AmazonS3FullAccess” policy and continue to the next step to confirm the role to be created.

It is important to note the name and the ARN (Amazon Resource Name) of the created role. This will be used when creating a new Lambda function to identify the role that is to be used for function invocation.

Note: AWS Lambda will automatically log all output from function invocations in AWS Cloudwatch, a logging service. If this functionality is desired, which is recommended for a production environment, permission to write to a Cloudwatch log stream must be added to the policies for this role.

The Code!

Overview

Now we are ready to start coding. We will assume at this point you have set up the “awscli” command. If you have not, you can follow the instructions at https://aws.amazon.com/cli/ to set up awscli on your computer.

Note: the code used in these examples is made shorter for ease of screen-viewing. For a more complete version visit the repository at https://github.com/gxx/aws-lambda-python/.

Read the full article on Toptal 

How to Create a Simple Python WebSocket Server Using Tornado

With the increase in popularity of real-time web applications, WebSockets have become a key technology in their implementation. The days where you had to constantly press the reload button to receive updates from the server are long gone. Web applications that want to provide real-time updates no longer have to poll the server for changes - instead, servers push changes down the stream as they happen. Robust web frameworks have begun supporting WebSockets out of the box. Ruby on Rails 5, for example, took it even further and added support for action cables.

In the world of Python, many popular web frameworks exist. Frameworks such as Django provide nearly everything necessary to build web applications, and anything that it lacks can be made up with one of the thousands of plugins available for Django. However, due to the way Python or most of its web frameworks work, handling long lived connections can quickly become a nightmare. The threaded model and global interpreter lock are often considered to be the achilles heel of Python.

But all of that has started to change. With certain new features of Python 3 and frameworks that already exist for Python, such as Tornado, handling long lived connections is a challenge no more. Tornado provides web server capabilities in Python that is specifically useful in handling long-lived connections.

How to Create a Simple Python WebSocket Server using Tornado

In this article, we will take a look at how a simple WebSocket server can be built in Python using Tornado. The demo application will allow us to upload a tab-separated values (TSV) file, parse it and make its contents available at a unique URL.

Tornado and WebSockets

Tornado is an asynchronous network library and specializes in dealing with event driven networking. Since it can naturally hold tens of thousands of open connections concurrently, a server can take advantage of this and handle a lot of WebSocket connections within a single node. WebSocket is a protocol that provides full-duplex communication channels over a single TCP connection. As it is an open socket, this technique makes a web connection stateful and facilitates real-time data transfer to and from the server. The server, keeping the states of the clients, makes it easy to implement real-time chat applications or web games based on WebSockets.

WebSockets are designed to be implemented in web browsers and servers, and is currently supported in all of the major web browsers. A connection is opened once and messages can travel back and forth multiple times before the connection is closed.

Installing Tornado is rather simple. It is listed in PyPI and can be installed using pip or easy_install:

pip install tornado

Tornado comes with its own implementation of WebSockets. For the purposes of this article, this is pretty much all we will need.

WebSockets in Action

One of the advantages of using WebSocket is its stateful property. This changes the way we typically think of client-server communication. One particular use case of this is where the server is required to perform long slow processes and gradually stream results back to the client.

In our example application, the user will be able to upload a file through WebSocket. For the entire lifetime of the connection, the server will retain the parsed file in-memory. Upon requests, the server can then send back parts of the file to the front-end. Furthermore, the file will be made available at a URL which can then be viewed by multiple users. If another file is uploaded at the same URL, everyone looking at it will be able to see the new file immediately.

For front-end, we will use AngularJS. This framework and libraries will allow us to easily handle file uploads and pagination. For everything related to WebSockets, however, we will use standard JavaScript functions.

This simple application will be broken down into three separate files:

  • parser.py: where our Tornado server with the request handlers is implemented
  • templates/index.html: front-end HTML template
  • static/parser.js: For our front-end JavaScript

Opening a WebSocket

From the front-end, a WebSocket connection can be established by instantiating a WebSocket object:

new WebSocket(WEBSOCKET_URL);

This is something we will have to do on page load. Once a WebSocket object is instantiated, handlers must be attached to handle three important events:

  • open: fired when a connection is established
  • message: fired when a message is received from the server
  • close: fired when a connection is closed
$scope.init = function() {
$scope.ws = new WebSocket('ws://' + location.host + '/parser/ws');
$scope.ws.binaryType = 'arraybuffer';
$scope.ws.onopen = function() {

console.log('Connected.')

};

$scope.ws.onmessage = function(evt) {
$scope.$apply(function () {
message = JSON.parse(evt.data);
$scope.currentPage = parseInt(message['page_no']);
$scope.totalRows = parseInt(message['total_number']);
$scope.rows = message['data'];
});
};
ope.ws.onclose = function() {
$s
c
console.log('Connection is closed...');

};

}
pe.init();
$sc
o

Since these event handlers will not automatically trigger AngularJS’s $scope lifecycle, the contents of the handler function needs to be wrapped in $apply. In case you are interested, AngularJS specific packages exist that make it easier to integrate WebSocket in AngularJS applications.

It’s worth mentioning that dropped WebSocket connections are not automatically reestablished, and will require the application to attempt reconnects when the close event handler is triggered. This is a bit beyond the scope of this article.

Selecting a File to Upload

Since we are building a single-page application using AngularJS, attempting to submit forms with files the age-old way will not work. To make things easier, we will use Danial Farid’s ng-file-upload library. Using which, all we need to do to allow a user to upload a file is add a button to our front-end template with specific AngularJS directives:

<button class="btn btn-default" type="file" ngf-select="uploadFile($file, $invalidFiles)"
accept=".tsv" ngf-max-size="10MB">Select File</button>

The library, among many things, allows us to set acceptable file extension and size. Clicking on this button, just like any <input type=”file”> element, will open the standard file picker.

Uploading the File

When you want to transfer binary data, you can choose among array buffer and blob. If it is just raw data like an image file, choose blob and handle it properly in server. Array buffer is for fixed-length binary buffer and a text file like TSV can be transferred in the format of byte string. This code snippet shows how to upload a file in array buffer format.

$scope.uploadFile = function(file, errFiles) {
ws = $scope.ws;
$scope.f = file;
$scope.errFile = errFiles && errFiles[0];
if (file) {
reader = new FileReader();
rawData = new ArrayBuffer();
reader.onload = function(evt) {

rawData = evt.target.result;

ws.send(rawData); }
} }
reader.readAsArrayBuffer(file);

The ng-file-upload directive provides an uploadFile function. Here you can transform the file into an array buffer using a FileReader, and send it through the WebSocket.

Note that sending large files over WebSocket by reading them into array buffers may not be the most optimum way to upload them as it can quickly occupy to much memory resulting in a poor experience.

Receive the File on the Server

Tornado determines the message type using the 4bit opcode, and returns str for binary data and unicode for text.

if opcode == 0x1:
# UTF-8 data
_bytes_in += len(data) try:
self._messag
e decoded = data.decode("utf-8")
self._abort() return
except UnicodeDecodeError:
self._run_callback(self.handler.on_message, decoded)
_bytes_in += len(da
elif opcode == 0x2: # Binary data self._messag
eta)
elf._run_callback(self.handler.on_message, data)

s

In Tornado web server, array buffer is received in type of str.

In this example the type of content we expect is TSV, so the file is parsed and transformed into a dictionary. Of course, in real applications, there are saner ways of dealing with arbitrary uploads.

def make_message(self, page_no=1):
page_size = 100 return {
number": len(self.r
"page_no": page_no, "total
_ows),
self.rows[page_size * (page_no - 1):page_size * page_no] } def
"data"
: on_message(self, message): if isinstance(message, str):
for line in (x.strip() for x in message.
self.rows = [csv.reader([line], delimiter="\t").next(
)splitlines()) if line]
e(self.make_message())
self.write_messa
g

Request a Page

Since our goal is to show uploaded TSV data in chunks of small pages, we need a means of requesting a particular page. To keep things simple, we will simply use the same WebSocket connection to send the page number to our server.

$scope.pageChanged = function() {
ws = $scope.ws;
ws.send($scope.currentPage);
}

The server will receive this message as unicode:

def on_message(self, message):
if isinstance(message, unicode):
page_no = int(message)
.make_message(page_no))
self.write_message(sel
f

Attempting to respond with a dict from a Tornado WebSocket server will automatically encode it in JSON format. So it’s completely okay to just to send a dict which contains 100 rows of content.

Sharing Access with Others

To be able to share access to the same upload with multiple users, we need to be able to uniquely identify the uploads. Whenever a user connects to the server over WebSocket, a random UUID will be generated and assigned to their connection.

def open(self, doc_uuid=None):
if doc_uuid is None:
uid.uuid4())
self.uuid = str(
u

uuid.uuid4() generates a random UUID and str() converts a UUID to a string of hex digits in standard form.

If another user with a UUID connects to the server, the corresponding instance of FileHandler is added to a dictionary with the UUID as the key and is removed when the connection is closed.

@classmethod
@tornado.gen.coroutine
def add_clients(cls, doc_uuid, client):
with (yield lock.acquire()):
s: clients_with_uuid =
if doc_uuid in cls.clien
tFileHandler.clients[doc_uuid]
pend(client) else: FileHa
clients_with_uuid.a
pndler.clients[doc_uuid] = [client] @classmethod @tornado.gen.coroutine
: if doc_uuid in cls.clients:
def remove_clients(cls, doc_uuid, client): with (yield lock.acquire()
) clients_with_uuid = FileHandler.clients[doc_uuid] clients_with_uuid.remove(client)
if len(clients_with_uuid) == 0:
del cls.clients[doc_uuid]

The clients dictionary may throw a KeyError when adding or removing clients simultaneously. As Tornado is an asynchronous networking library, it provides locking mechanisms for synchronization. A simple lock with coroutine fits this case of handling clients dictionary.

If any user uploads a file or move between pages, all the users with the same UUID view the same page.

@classmethod
def send_messages(cls, doc_uuid):
clients_with_uuid = cls.clients[doc_uuid]
message = cls.make_message(doc_uuid)
try: client.write_mess
for client in clients_with_uuid:
age(message) except:
or sending message", exc_info=True)
logging.error("Er
r

Running Behind Nginx

Implementing WebSockets is very simple, but there are some tricky things to consider when using it in production environments. Tornado is a web server, so it can get users’ requests directly, but deploying it behind Nginx may be a better choice for many reasons. However, it takes ever so slightly more effort to be able to use WebSockets through Nginx:

http {
upstream parser {
server 127.0.0.1:8080;
} server {
/parser/ws { proxy_p
location ^~
ass http://parser;
tp_version 1.1; proxy_s
proxy_h
tet_header Upgrade $http_upgrade;
ction "upgrade"; } } }
proxy_set_header Conn
e

The two proxy_set_header directives make Nginx pass the necessary headers to the back-end servers which are necessary for upgrading the connection to WebSocket.

What’s Next?

In this article, we implemented a simple Python web application that uses WebSockets to maintain persistent connections between the server and each of the clients. With modern asynchronous networking frameworks like Tornado, holding tens of thousands of open connections concurrently in Python is entirely feasible.

Although certain implementation aspects of this demo application could have been done differently, I hope it still helped demonstrate the usage of WebSockets in https://www.toptal.com/tornado framework. Source code of the demo application is available on GitHub

Originally appeared in Toptal Engineering blog

Why Are There So Many Pythons? A Python Implementation Comparison

Python is amazing.

Surprisingly, that’s a fairly ambiguous statement. What do I mean by ‘Python’? Do I mean Python the abstractinterface? Do I mean CPython, the common Python implementation (and not to be confused with the similarly named Cython)? Or do I mean something else entirely? Maybe I’m obliquely referring to Jython, or IronPython, or PyPy. Or maybe I’ve really gone off the deep end and I’m talking about RPython or RubyPython (which are very, very different things).

While the technologies mentioned above are commonly-named and commonly-referenced, some of them serve completely different purposes (or, at least, operate in completely different ways).

Throughout my time working with the Python interfaces, I’ve run across tons of these .*ython tools. But not until recently did I take the time to understand what they are, how they work, and why they’re necessary (in their own ways).

In this tutorial, I’ll start from scratch and move through the various Python implementations, concluding with a thorough introduction to PyPy, which I believe is the future of the language.

It all starts with an understanding of what ‘Python’ actually is.

If you have a good understanding for machine code, virtual machines, and the like, feel free to skip ahead.

“Is Python interpreted or compiled?”

This is a common point of confusion for Python beginners.

The first thing to realize when making a comparison is that ‘Python’ is an interface. There’s a specification of what Python should do and how it should behave (as with any interface). And there are multipleimplementations (as with any interface).

The second thing to realize is that ‘interpreted’ and ‘compiled’ are properties of an implementation, not aninterface.

So the question itself isn’t really well-formed.

Is Python interpreted or compiled? The question isn't really well-formed.

That said, for the most common Python implementation (CPython: written in C, often referred to as simply ‘Python’, and surely what you’re using if you have no idea what I’m talking about), the answer is: interpreted, with some compilation. CPython compiles* Python source code to bytecode, and then interprets this bytecode, executing it as it goes.

Note: this isn’t ‘compilation’ in the traditional sense of the word. Typically, we’d say that ‘compilation’ is taking a high-level language and converting it to machine code. But it is a ‘compilation’ of sorts.

Let’s look at that answer more closely, as it will help us understand some of the concepts that come up later in the post.

Bytecode vs. Machine Code

It’s very important to understand the difference between bytecode vs. machine code (aka native code), perhaps best illustrated by example:

  • C compiles to machine code, which is then run directly on your processor. Each instruction instructs your CPU to move stuff around.
  • Java compiles to bytecode, which is then run on the Java Virtual Machine (JVM), an abstraction of a computer that executes programs. Each instruction is then handled by the JVM, which interacts with your computer.

In very brief terms: machine code is much faster, but bytecode is more portable and secure.

Machine code looks different depending on your machine, but bytecode looks the same on all machines. One might say that machine code is optimized to your setup.

Returning to CPython implementation, the toolchain process is as follows:

  1. CPython compiles your Python source code into bytecode.
  2. That bytecode is then executed on the CPython Virtual Machine.
Beginners often assume Python is compiled because of .pyc files. There's some truth to that: the .pyc file is the compiled bytecode, which is then interpreted. So if you've run your Python code before and have the .pyc file handy, it will run faster the second time, as it doesn't have to re-compile the bytecode.

Alternative VMs: Jython, IronPython, and More

As I mentioned earlier, Python has several implementations. Again, as mentioned earlier, the most common is CPython, but there are others that should be mentioned for the sake of this comparison guide. This a Python implementation written in C and considered the ‘default’ implementation.

But what about the alternative Python implementations? One of the more prominent is Jython, a Python implementation written Java that utilizes the JVM. While CPython produces bytecode to run on the CPython VM, Jython produces Java bytecode to run on the JVM (this is the same stuff that’s produced when you compile a Java program).

Jython's use of Java bytecode is depicted in this Python implementation diagram.

“Why would you ever use an alternative implementation?”, you might ask. Well, for one, these different Python implementations play nicely with different technology stacks.

CPython makes it very easy to write C-extensions for your Python code because in the end it is executed by a C interpreter. Jython, on the other hand, makes it very easy to work with other Java programs: you can importany Java classes with no additional effort, summoning up and utilizing your Java classes from within your Jython programs. (Aside: if you haven’t thought about it closely, this is actually nuts. We’re at the point where you can mix and mash different languages and compile them all down to the same substance. (As mentioned by Rostin, programs that mix Fortran and C code have been around for a while. So, of course, this isn’t necessarily new. But it’s still cool.))

As an example, this is valid Jython code:

[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_51
>>> from java.util import HashSet
>>> s = HashSet(5)
>>> s.add("Foo")
>>> s.add("Bar")
>>> s
[Foo, Bar]

IronPython is another popular Python implementation, written entirely in C# and targeting the .NET stack. In particular, it runs on what you might call the .NET Virtual Machine, Microsoft’s Common Language Runtime (CLR), comparable to the JVM.

You might say that Jython : Java :: IronPython : C#. They run on the same respective VMs, you can import C# classes from your IronPython code and Java classes from your Jython code, etc.

It’s totally possible to survive without ever touching a non-CPython Python implementation. But there are advantages to be had from switching, most of which are dependent on your technology stack. Using a lot of JVM-based languages? Jython might be for you. All about the .NET stack? Maybe you should try IronPython (and maybe you already have).

This Python comparison chart demonstrates the differences between Python implementations.

By the way: while this wouldn’t be a reason to use a different implementation, note that these implementations do actually differ in behavior beyond how they treat your Python source code. However, these differences are typically minor, and dissolve or emerge over time as these implementations are under active development. For example, IronPython uses Unicode strings by default; CPython, however, defaults to ASCII for versions 2.x (failing with a UnicodeEncodeError for non-ASCII characters), but does support Unicode strings by default for 3.x.

Just-in-Time Compilation: PyPy, and the Future

So we have a Python implementation written in C, one in Java, and one in C#. The next logical step: a Python implementation written in… Python. (The educated reader will note that this is slightly misleading.)

Here’s where things might get confusing. First, lets discuss just-in-time (JIT) compilation.

JIT: The Why and How

Recall that native machine code is much faster than bytecode. Well, what if we could compile some of our bytecode and then run it as native code? We’d have to pay some price to compile the bytecode (i.e., time), but if the end result was faster, that’d be great! This is the motivation of JIT compilation, a hybrid technique that mixes the benefits of interpreters and compilers. In basic terms, JIT wants to utilize compilation to speed up an interpreted system.

For example, a common approach taken by JITs:

  1. Identify bytecode that is executed frequently.
  2. Compile it down to native machine code.
  3. Cache the result.
  4. Whenever the same bytecode is set to be run, instead grab the pre-compiled machine code and reap the benefits (i.e., speed boosts).

This is what PyPy implementation is all about: bringing JIT to Python (see the Appendix for previous efforts). There are, of course, other goals: PyPy aims to be cross-platform, memory-light, and stackless-supportive. But JIT is really its selling point. As an average over a bunch of time tests, it’s said to improve performance by a factor of 6.27. For a breakdown, see this chart from the PyPy Speed Center:

Bringing JIT to Python interface using PyPy implementation pays off in performance improvements.

PyPy is Hard to Understand

PyPy has huge potential, and at this point it’s highly compatible with CPython (so it can run Flask, Django, etc.).

But there’s a lot of confusion around PyPy (see, for example, this nonsensical proposal to create a PyPyPy…). In my opinion, that’s primarily because PyPy is actually two things:

  1. A Python interpreter written in RPython (not Python (I lied before)). RPython is a subset of Python with static typing. In Python, it’s “mostly impossible” to reason rigorously about types (Why is it so hard? Well consider the fact that:

    	 x = random.choice([1, "foo"])
    	
    	

    would be valid Python code (credit to Ademan). What is the type of x? How can we reason about types of variables when the types aren’t even strictly enforced?). With RPython, you sacrifice some flexibility, but instead make it much, much easier to reason about memory management and whatnot, which allows for optimizations.

  2. A compiler that compiles RPython code for various targets and adds in JIT. The default platform is C, i.e., an RPython-to-C compiler, but you could also target the JVM and others.

Solely for clarity in this Python comparison guide, I’ll refer to these as PyPy (1) and PyPy (2).

Why would you need these two things, and why under the same roof? Think of it this way: PyPy (1) is an interpreter written in RPython. So it takes in the user’s Python code and compiles it down to bytecode. But the interpreter itself (written in RPython) must be interpreted by another Python implementation in order to run, right?

Well, we could just use CPython to run the interpreter. But that wouldn’t be very fast.

Instead, the idea is that we use PyPy (2) (referred to as the RPython Toolchain) to compile PyPy’s interpreter down to code for another platform (e.g., C, JVM, or CLI) to run on our machine, adding in JIT as well. It’s magical: PyPy dynamically adds JIT to an interpreter, generating its own compiler! (Again, this is nuts: we’re compiling an interpreter, adding in another separate, standalone compiler.)

In the end, the result is a standalone executable that interprets Python source code and exploits JIT optimizations. Which is just what we wanted! It’s a mouthful, but maybe this diagram will help:

This diagram illustrates the beauty of the PyPy implementation, including an interpreter, compiler, and an executable with JIT.

To reiterate, the real beauty of PyPy is that we could write ourselves a bunch of different Python interpreters in RPython without worrying about JIT. PyPy would then implement JIT for us using the RPython Toolchain/PyPy (2).

In fact, if we get even more abstract, you could theoretically write an interpreter for any language, feed it to PyPy, and get a JIT for that language. This is because PyPy focuses on optimizing the actual interpreter, rather than the details of the language it’s interpreting.

You could theoretically write an interpreter for any language, feed it to PyPy, and get a JIT for that language.

As a brief digression, I’d like to mention that the JIT itself is absolutely fascinating. It uses a technique called tracing, which executes as follows:

  1. Run the interpreter and interpret everything (adding in no JIT).
  2. Do some light profiling of the interpreted code.
  3. Identify operations you’ve performed before.
  4. Compile these bits of code down to machine code.

For more, this paper is highly accessible and very interesting.

To wrap up: we use PyPy’s RPython-to-C (or other target platform) compiler to compile PyPy’s RPython-implemented interpreter.

Wrapping Up

After a lengthy comparison of Python implementations, I have to ask myself: Why is this so great? Why is this crazy idea worth pursuing? I think Alex Gaynor put it well on his blog: “[PyPy is the future] because [it] offers better speed, more flexibility, and is a better platform for Python’s growth.”

In short:

  • It’s fast because it compiles source code to native code (using JIT).
  • It’s flexible because it adds the JIT to your interpreter with very little additional work.
  • It’s flexible (again) because you can write your interpreters in RPython, which is easier to extend than, say, C (in fact, it’s so easy that there’s a tutorial for writing your own interpreters).

Appendix: Other Python Names You May Have Heard

  • Python 3000 (Py3k): an alternative naming for Python 3.0, a major, backwards-incompatible Python release that hit the stage in 2008. The Py3k team predicted that it would take about five years for this new version to be fully adopted. And while most (warning: anecdotal claim) Python developers continue to use Python 2.x, people are increasingly conscious of Py3k.

  • Cython: a superset of Python that includes bindings to call C functions.
    • Goal: allow you to write C extensions for your Python code.
    • Also lets you add static typing to your existing Python code, allowing it to be compiled and reach C-like performance.
    • This is similar to PyPy, but not the same. In this case, you’re enforcing typing in the user’s code before passing it to a compiler. With PyPy, you write plain old Python, and the compiler handles any optimizations.

  • Numba: a “just-in-time specializing compiler” that adds JIT to annotated Python code. In the most basic terms, you give it some hints, and it speeds up portions of your code. Numba comes as part of theAnaconda distribution, a set of packages for data analysis and management.

  • IPython: very different from anything else discussed. A computing environment for Python. Interactive with support for GUI toolkits and browser experience, etc.

  • Psyco: a Python extension module, and one of the early Python JIT efforts. However, it’s since been marked as “unmaintained and dead”. In fact, the lead developer of Psyco, Armin Rigo, now works on PyPy.

Python Language Bindings

  • RubyPython: a bridge between the Ruby and Python VMs. Allows you to embed Python code into your Ruby code. You define where the Python starts and stops, and RubyPython marshals the data between the VMs.

  • PyObjc: language-bindings between Python and Objective-C, acting as a bridge between them. Practically, that means you can utilize Objective-C libraries (including everything you need to create OS X applications) from your Python code, and Python modules from your Objective-C code. In this case, it’s convenient that CPython is written in C, which is a subset of Objective-C.

  • PyQt: while PyObjc gives you binding for the OS X GUI components, PyQt does the same for the Qt application framework, letting you create rich graphic interfaces, access SQL databases, etc. Another tool aimed at bringing Python’s simplicity to other frameworks.

JavaScript Frameworks

  • pyjs (Pyjamas): a framework for creating web and desktop applications in Python. Includes a Python-to-JavaScript compiler, a widget set, and some more tools.

  • Brython: a Python VM written in JavaScript to allow for Py3k code to be executed in the browser.

This article was written by  CHARLES MARSHToptal's Head of Community.

Python Class Attributes

I had a programming interview recently, a phone-screen in which we used a collaborative text editor.

I was asked to implement a certain API, and chose to do so in Python. Abstracting away the problem statement, let’s say I needed a class whose instances stored some data and some other_data.

I took a deep breath and started typing. After a few lines, I had something like this:

class Service(object):
data = []
def __init__(self, other_data):
self.other_data = other_data
...

My interviewer stopped me:

  • Interviewer: “That line: data = []. I don’t think that’s valid Python?”
  • Me: “I’m pretty sure it is. It’s just setting a default value for the instance attribute.”
  • Interviewer: “When does that code get executed?”
  • Me: “I’m not really sure. I’ll just fix it up to avoid confusion.”

For reference, and to give you an idea of what I was going for, here’s how I amended the code:

class Service(object):
def __init__(self, other_data):
self.data = []
self.other_data = other_data
...

As it turns out, we were both wrong. The real answer lay in understanding the distinction between class and instance attributes.

Python class attributes vs. Python instance attributes

Note: if you have an expert handle on class attributes, you can skip ahead to use cases.

Class Attributes

My interviewer was wrong in that the above code is syntactically valid.

I too was wrong in that it isn’t setting a “default value” for the instance attribute. Instead, it’s defining data as a class attribute with value [].

In my experience, class attributes are a topic that many people know something about, but few understand completely.

What’s the difference?

A class attribute is an attribute of the class (circular, I know), rather than an attribute of an instance of a class.

Let’s use an example to illustrate the difference. Here, class_var is a class attribute, and i_var is an instance attribute:

class MyClass(object):
class_var = 1
def __init__(self, i_var):
self.i_var = i_var

Note that all instances of the class have access to class_var, and that it can also be accessed as a property of the class itself:

foo = MyClass(2)
bar = MyClass(3)
foo.class_var, foo.i_var
## 1, 2
bar.class_var, bar.i_var
## 1, 3
MyClass.class_var ## <— This is key
## 1

For Java or C++ programmers, the class attribute is similar—but not identical—to the static member. We’ll see how they differ below.

Class vs. instance namespaces

To understand what’s happening here, let’s talk briefly about Python namespaces.

namespace is a mapping from names to objects, with the property that there is zero relation between names in different namespaces. They’re usually implemented as Python dictionaries, although this is abstracted away.

Depending on the context, you may need to access a namespace using dot syntax (e.g., object.name_from_objects_namespace) or as a local variable (e.g., object_from_namespace). As a concrete example:

class MyClass(object):
## No need for dot syntax
class_var = 1
def __init__(self, i_var):
self.i_var = i_var
## Need dot syntax as we've left scope of class namespace
MyClass.class_var
## 1

Python classes and instances of classes each have their own distinct namespaces represented by pre-defined attributes MyClass.__dict__ and instance_of_MyClass.__dict__, respectively.

When you try to access an attribute from an instance of a class, it first looks at its instance namespace. If it finds the attribute, it returns the associated value. If not, it then looks in the class namespace and returns the attribute (if it’s present, throwing an error otherwise). For example:

foo = MyClass(2)
## Finds i_var in foo's instance namespace
foo.i_var
## 2
## Doesn't find class_var in instance namespace…
## So look's in class namespace (MyClass.__dict__)
foo.class_var
## 1

The instance namespace takes supremacy over the class namespace: if there is an attribute with the same name in both, the instance namespace will be checked first and its value returned. Here’s a simplified version of the code (source) for attribute lookup:

def instlookup(inst, name):
## simplified algorithm...
if inst.__dict__.has_key(name):
return inst.__dict__[name]
else:
return inst.__class__.__dict__[name]

And, in visual form:

attribute lookup in visual form

Handling assignment

With this in mind, we can make sense of how class attributes handle assignment:

  • If a class attribute is set by accessing the class, it will override the value for all instances. For example:

    	foo = MyClass(2)
    	foo.class_var
    	## 1
    	MyClass.class_var = 2
    	foo.class_var
    	## 2
    	
    	

    At the namespace level… we’re setting MyClass.__dict__['class_var'] = 2. (Note: this isn’t the exact code(which would be setattr(MyClass, 'class_var', 2)) as __dict__ returns a dictproxy, an immutable wrapper that prevents direct assignment, but it helps for demonstration’s sake). Then, when we access foo.class_varclass_var has a new value in the class namespace and thus 2 is returned.

  • If a class variable is set by accessing an instance, it will override the value only for that instance. This essentially overrides the class variable and turns it into an instance variable available, intuitively, only for that instance. For example:

    	foo = MyClass(2)
    	foo.class_var
    	## 1
    	foo.class_var = 2
    	foo.class_var
    	## 2
    	MyClass.class_var
    	## 1
    	
    	

    At the namespace level… we’re adding the class_var attribute to foo.__dict__, so when we lookup foo.class_var, we return 2. Meanwhile, other instances of MyClass will not have class_var in their instance namespaces, so they continue to find class_var in MyClass.__dict__ and thus return 1.

Mutability

Quiz question: What if your class attribute has a mutable type? You can manipulate (mutilate?) the class attribute by accessing it through a particular instance and, in turn, end up manipulating the referenced object that all instances are accessing (as pointed out by Timothy Wiseman).

This is best demonstrated by example. Let’s go back to the Service I defined earlier and see how my use of a class variable could have led to problems down the road.

class Service(object):
data = []
def __init__(self, other_data):
self.other_data = other_data
...

My goal was to have the empty list ([]) as the default value for data, and for each instance of Service to have its own data that would be altered over time on an instance-by-instance basis. But in this case, we get the following behavior (recall that Service takes some argument other_data, which is arbitrary in this example):

s1 = Service(['a', 'b'])
s2 = Service(['c', 'd'])
s1.data.append(1)
s1.data
## [1]
s2.data
## [1]
s2.data.append(2)
s1.data
## [1, 2]
s2.data
## [1, 2]

This is no good—altering the class variable via one instance alters it for all the others!

At the namespace level… all instances of Service are accessing and modifying the same list in Service.__dict__ without making their own data attributes in their instance namespaces.

We could get around this using assignment; that is, instead of exploiting the list’s mutability, we could assign our Service objects to have their own lists, as follows:

s1 = Service(['a', 'b'])
s2 = Service(['c', 'd'])
s1.data = [1]
s2.data = [2]
s1.data
## [1]
s2.data
## [2]

In this case, we’re adding s1.__dict__['data'] = [1], so the original Service.__dict__['data'] remains unchanged.

Unfortunately, this requires that Service users have intimate knowledge of its variables, and is certainly prone to mistakes. In a sense, we’d be addressing the symptoms rather than the cause. We’d prefer something that was correct by construction.

My personal solution: if you’re just using a class variable to assign a default value to a would-be instance variable, don’t use mutable values. In this case, every instance of Service was going to override Service.data with its own instance attribute eventually, so using an empty list as the default led to a tiny bug that was easily overlooked. Instead of the above, we could’ve either:

  1. Stuck to instance attributes entirely, as demonstrated in the introduction.
  2. Avoided using the empty list (a mutable value) as our “default”:

    	class Service(object):
    	data = None
    	def __init__(self, other_data):
    	self.other_data = other_data
    	...
    	
    	

    Of course, we’d have to handle the None case appropriately, but that’s a small price to pay.

Like what you're reading?
Get the latest updates first.
No spam. Just great engineering and design posts.

So when would you use them?

Class attributes are tricky, but let’s look at a few cases when they would come in handy:

  1. Storing constants. As class attributes can be accessed as attributes of the class itself, it’s often nice to use them for storing Class-wide, Class-specific constants. For example:

    	class Circle(object):
    	pi = 3.14159
    	def __init__(self, radius):
    	self.radius = radius
    	def area(self):
    	return Circle.pi * self.radius * self.radius
    	Circle.pi
    	## 3.14159
    	c = Circle(10)
    	c.pi
    	## 3.14159
    	c.area()
    	## 314.159
    	
    	
  2. Defining default values. As a trivial example, we might create a bounded list (i.e., a list that can only hold a certain number of elements or fewer) and choose to have a default cap of 10 items:

    	class MyClass(object):
    	limit = 10
    	def __init__(self):
    	self.data = []
    	def item(self, i):
    	return self.data[i]
    	def add(self, e):
    	if len(self.data) >= self.limit:
    	raise Exception("Too many elements")
    	self.data.append(e)
    	MyClass.limit
    	## 10
    	
    	

    We could then create instances with their own specific limits, too, by assigning to the instance’s limitattribute.

    	foo = MyClass()
    	foo.limit = 50
    	## foo can now hold 50 elements—other instances can hold 10
    	
    	

    This only makes sense if you will want your typical instance of MyClass to hold just 10 elements or fewer—if you’re giving all of your instances different limits, then limit should be an instance variable. (Remember, though: take care when using mutable values as your defaults.)

  3. Tracking all data across all instances of a given class. This is sort of specific, but I could see a scenario in which you might want to access a piece of data related to every existing instance of a given class.

    To make the scenario more concrete, let’s say we have a Person class, and every person has a name. We want to keep track of all the names that have been used. One approach might be to iterate over the garbage collector’s list of objects, but it’s simpler to use class variables.

    Note that, in this case, names will only be accessed as a class variable, so the mutable default is acceptable.

    	
     Posted by 
     ipapuc
    
     (    Python  )
      
     :: 
    Comments
    (1) ::
    
       Permalink ::
    
       Trackbacks (0)
    
    

Computational Geometry in Python: From Theory to Application

When people think computational geometry, in my experience, they typically think one of two things:

  1. Wow, that sounds complicated.
  2. Oh yeah, convex hull.

In this post, I’d like to shed some light on computational geometry, starting with a brief overview of the subject before moving into some practical advice based on my own experiences (skip ahead if you have a good handle on the subject).

What’s all the fuss about?

While convex hull computational geometry algorithms are typically included in an introductory algorithms course, computational geometry is a far richer subject that rarely gets sufficient attention from the average developer/computer scientist (unless you’re making games or something).

Theoretically intriguing…

From a theoretical standpoint, the questions in computational geometry are often exceedingly interesting; the answers, compelling; and the paths by which they’re reached, varied. These qualities alone make it a field worth studying, in my opinion.

For example, consider the Art Gallery Problem: We own an art gallery and want to install security cameras to guard our artwork. But we’re under a tight budget, so we want to use as few cameras as possible. How many cameras do we need?

When we translate this to computational geometric notation, the ‘floor plan’ of the gallery is just a simple polygon. And with some elbow grease, we can prove that n/3 cameras is always sufficient for a polygon on nvertices, no matter how messy it is. The proof itself uses dual graphs, some graph theory, triangulations, and more.

Here, we see a clever proof technique and a result that is curious enough to be appreciated on its own. But if theoretical relevance isn’t enough for you…

And important in-practice

As I mentioned earlier, game development relies heavily on the application of computational geometry (for example, collision detection often relies on computing the convex hull of a set of objects); as do geographic information systems (GIS), which are used for storing and performing computations on geographical data; and robotics, too (e.g., for visibility and planning problems).

Why’s it so tough?

Let’s take a fairly straightforward computational geometry problem: given a point and a polygon, does the point lie inside of the polygon? (This is called the point-in-polygon, or PIP problem.)

PIP does a great job of demonstrating why computational geometry can be (deceptively) tough. To the human eye, this isn’t a hard question. We see the following diagram and it’s immediately obvious to us that the point is in the polygon:

This point-in-polygon problem is a good example of computational geometry in one of its many applications.

Even for relatively complicated polygons, the answer doesn’t elude us for more than a second or two. But when we feed this problem to a computer, it might see the following:

poly = Polygon([Point(0, 5), Point(1, 1), Point(3, 0),
Point(7, 2), Point(7, 6), Point(2, 7)])
point = Point(5.5, 2.5)
poly.contains(point)

What is intuitive to the human brain does not translate so easily to computer language.

More abstractly (and ignoring the need to represent these things in code), the problems we see in this discipline are very hard to rigorize (‘make rigorous’) in a computational geometry algorithm. How would we describe the point-in-polygon scenario without using such tautological language as ‘A point is inside a polygon if it is inside the polygon’? Many of these properties are so fundamental and so basic that it is difficult to define them concretely.

How would we describe the point-in-polygon scenario without using such tautological language as 'it's inside the polygon if it's inside the polygon'?

Difficult, but not impossible. For example, you could rigorize point-in-polygon with the following definitions:

  • A point is inside a polygon if any infinite ray beginning at the point intersects with an odd number of polygon edges (known as the even-odd rule).
  • A point is inside a polygon if it has a non-zero winding number (defined as the number of times that the curve defining the polygon travels around the point).

Unless you’ve had some experience with computational geometry, these definitions probably won’t be a part of your existing vocabulary. And perhaps that’s emblematic of how computational geometry can push you to think differently.

Introducing CCW

Now that we have a sense for the importance and difficulty of computational geometry problems, it’s time to get our hands wet.

At the backbone of the subject is a deceptively powerful primitive operation: counterclockwise, or ‘CCW’ for short. (I’ll warn you now: CCW will pop up again and again.)

CCW takes three points A, B, and C as arguments and asks: do these three points compose a counterclockwise turn (vs. a clockwise turn)? In other words, is A -> B -> C a counterclockwise angle?

For example, the green points are CCW, while the red points are not:

This computational geometry problem requires points both clockwise and counterclockwise.

Why CCW Matters

CCW gives us a primitive operation on which we can build. It gives us a place to start rigorizing and solving computational geometry problems.

To give you a sense for its power, let’s consider two examples.

Determining Convexity

The first: given a polygon, can you determine if it’s convex? Convexity is an invaluable property: knowing that your polygons are convex often lets you improve performance by orders of magnitude. As a concrete example: there’s a fairly straightforward PIP algorithm that runs in Log(n) time for convex polygons, but fails for many concave polygons.

Intuitively, this gap makes sense: convex shapes are ‘nice’, while concave shapes can have sharp edges jutting in and out—they just don’t follow the same rules.

A simple (but non-obvious) computational geometry algorithm for determining convexity is to check that every triplet of consecutive vertices is CCW. This takes just a few lines of Python geometry code (assuming that the points are provided in counterclockwise order—if points is in clockwise order, you’ll want all triplets to be clockwise):

class Polygon(object):
...
def isConvex(self):
for i in range(self.n):
# Check every triplet of points
A = self.points[i % self.n]
B = self.points[(i + 1) % self.n]
C = self.points[(i + 2) % self.n]
if not ccw(A, B, C):
return False
return True

Try this on paper with a few examples. You can even use this result to define convexity. (To make things more intuitive, note that a CCW curve from A -> B -> C corresponds to an angle of less than 180º, which is a widely taught way to define convexity.)

Line Intersection

As a second example, consider line segment intersection, which can also be solved using CCW alone:

def intersect(a1, b1, a2, b2):
"""Returns True if line segments a1b1 and a2b2 intersect."""
return ccw(a1, b1, a2) != ccw(a1, b1, b2) and ccw(a2, b2, a1) != ccw(a2, b2, b1)

Why is this the case? Line segment intersection can also be phrased as: given a segment with endpoints A and B, do the endpoints C and D of another segment lie on the same side of AB? In other words, if the turns from A -> B -> C and A -> B -> D are in the same direction, the segments can’t intersect. When we use this type of language, it becomes clear that such a problem is CCW’s bread and butter.

A Rigorous Definition

Now that we have a taste for the importance of CCW, let’s see how it’s computed. Given points A, B, and C:

def ccw(A, B, C):
"""Tests whether the turn formed by A, B, and C is ccw"""
return (B.x - A.x) * (C.y - A.y) > (B.y - A.y) * (C.x - A.x)

To understand where this definition comes from, consider the vectors AB and BC. If we take their cross product, AB x BC, this will be a vector along the z-axis. But in which direction (i.e, +z or -z)? As it turns out, if the cross product is positive, the turn is counterclockwise; otherwise, it’s clockwise.

This definition will seem unintuitive unless you have a really good understanding of linear algebra, the right-hand rule, etc. But that’s why we have abstraction—when you think CCW, just think of its intuitive definition rather than its computation. The value will be immediately clear.

My Dive Into Computational Geometry and Programming Using Python

Over the past month, I’ve been working on implementing several computational geometry algorithms in Python. As I’ll be drawing on them throughout the next few sections, I’ll take a second to describe my computational geometry applications, which can be found on GitHub.

Note: My experience is admittedly limited. As I’ve been working on this stuff for months rather than years, take my advice with a grain of salt. That said, I learned much in those few months, so I hope these tips prove useful.

Read the full article in Toptal Engineering blog 

To Python 3 and Back Again: Is It Worth the Switch?

Python 3 has been in existence for 7 years now, yet some still prefer to use Python 2 instead of the newer version. This is a problem especially for neophytes that are approaching Python for the first time. I realized this at my previous workplace with colleagues in the exact same situation. Not only were they unaware of the differences between the two versions, they were not even aware of the version that they had installed.

Inevitably, different colleagues had installed different versions of the interpreter. That was a recipe for disaster if they would’ve then tried to blindly share the scripts between them.

This wasn’t quite their fault, on the contrary. A greater effort for documenting and raising awareness is needed to dispel that veil of FUD (fear, uncertainty and doubt) that sometimes affects our choices. This post is thus thought for them, or for those who already use Python 2 but aren’t sure about moving to the next version, maybe because they tried version 3 only at the beginning when it was less refined and support for libraries was worse.

Read the full article in the Toptal Engineering blog