How to Create a Simple Python WebSocket Server Using Tornado

With the increase in popularity of real-time web applications, WebSockets have become a key technology in their implementation. The days where you had to constantly press the reload button to receive updates from the server are long gone. Web applications that want to provide real-time updates no longer have to poll the server for changes - instead, servers push changes down the stream as they happen. Robust web frameworks have begun supporting WebSockets out of the box. Ruby on Rails 5, for example, took it even further and added support for action cables.

In the world of Python, many popular web frameworks exist. Frameworks such as Django provide nearly everything necessary to build web applications, and anything that it lacks can be made up with one of the thousands of plugins available for Django. However, due to the way Python or most of its web frameworks work, handling long lived connections can quickly become a nightmare. The threaded model and global interpreter lock are often considered to be the achilles heel of Python.

But all of that has started to change. With certain new features of Python 3 and frameworks that already exist for Python, such as Tornado, handling long lived connections is a challenge no more. Tornado provides web server capabilities in Python that is specifically useful in handling long-lived connections.

How to Create a Simple Python WebSocket Server using Tornado

In this article, we will take a look at how a simple WebSocket server can be built in Python using Tornado. The demo application will allow us to upload a tab-separated values (TSV) file, parse it and make its contents available at a unique URL.

Tornado and WebSockets

Tornado is an asynchronous network library and specializes in dealing with event driven networking. Since it can naturally hold tens of thousands of open connections concurrently, a server can take advantage of this and handle a lot of WebSocket connections within a single node. WebSocket is a protocol that provides full-duplex communication channels over a single TCP connection. As it is an open socket, this technique makes a web connection stateful and facilitates real-time data transfer to and from the server. The server, keeping the states of the clients, makes it easy to implement real-time chat applications or web games based on WebSockets.

WebSockets are designed to be implemented in web browsers and servers, and is currently supported in all of the major web browsers. A connection is opened once and messages can travel back and forth multiple times before the connection is closed.

Installing Tornado is rather simple. It is listed in PyPI and can be installed using pip or easy_install:

pip install tornado

Tornado comes with its own implementation of WebSockets. For the purposes of this article, this is pretty much all we will need.

WebSockets in Action

One of the advantages of using WebSocket is its stateful property. This changes the way we typically think of client-server communication. One particular use case of this is where the server is required to perform long slow processes and gradually stream results back to the client.

In our example application, the user will be able to upload a file through WebSocket. For the entire lifetime of the connection, the server will retain the parsed file in-memory. Upon requests, the server can then send back parts of the file to the front-end. Furthermore, the file will be made available at a URL which can then be viewed by multiple users. If another file is uploaded at the same URL, everyone looking at it will be able to see the new file immediately.

For front-end, we will use AngularJS. This framework and libraries will allow us to easily handle file uploads and pagination. For everything related to WebSockets, however, we will use standard JavaScript functions.

This simple application will be broken down into three separate files:

  • parser.py: where our Tornado server with the request handlers is implemented
  • templates/index.html: front-end HTML template
  • static/parser.js: For our front-end JavaScript

Opening a WebSocket

From the front-end, a WebSocket connection can be established by instantiating a WebSocket object:

new WebSocket(WEBSOCKET_URL);

This is something we will have to do on page load. Once a WebSocket object is instantiated, handlers must be attached to handle three important events:

  • open: fired when a connection is established
  • message: fired when a message is received from the server
  • close: fired when a connection is closed
$scope.init = function() {
$scope.ws = new WebSocket('ws://' + location.host + '/parser/ws');
$scope.ws.binaryType = 'arraybuffer';
$scope.ws.onopen = function() {
console.log('Connected.')
};
$scope.ws.onmessage = function(evt) {
$scope.$apply(function () {
message = JSON.parse(evt.data);
$scope.currentPage = parseInt(message['page_no']);
$scope.totalRows = parseInt(message['total_number']);
$scope.rows = message['data'];
});
};
$scope.ws.onclose = function() {
console.log('Connection is closed...');
};
}
$scope.init();

Since these event handlers will not automatically trigger AngularJS’s $scope lifecycle, the contents of the handler function needs to be wrapped in $apply. In case you are interested, AngularJS specific packages exist that make it easier to integrate WebSocket in AngularJS applications.

It’s worth mentioning that dropped WebSocket connections are not automatically reestablished, and will require the application to attempt reconnects when the close event handler is triggered. This is a bit beyond the scope of this article.

Selecting a File to Upload

Since we are building a single-page application using AngularJS, attempting to submit forms with files the age-old way will not work. To make things easier, we will use Danial Farid’s ng-file-upload library. Using which, all we need to do to allow a user to upload a file is add a button to our front-end template with specific AngularJS directives:

<button class="btn btn-default" type="file" ngf-select="uploadFile($file, $invalidFiles)"
accept=".tsv" ngf-max-size="10MB">Select File</button>

The library, among many things, allows us to set acceptable file extension and size. Clicking on this button, just like any <input type=”file”> element, will open the standard file picker.

Uploading the File

When you want to transfer binary data, you can choose among array buffer and blob. If it is just raw data like an image file, choose blob and handle it properly in server. Array buffer is for fixed-length binary buffer and a text file like TSV can be transferred in the format of byte string. This code snippet shows how to upload a file in array buffer format.

$scope.uploadFile = function(file, errFiles) {
ws = $scope.ws;
$scope.f = file;
$scope.errFile = errFiles && errFiles[0];
if (file) {
reader = new FileReader();
rawData = new ArrayBuffer();
reader.onload = function(evt) {
rawData = evt.target.result;
ws.send(rawData);
}
reader.readAsArrayBuffer(file);
}
}

The ng-file-upload directive provides an uploadFile function. Here you can transform the file into an array buffer using a FileReader, and send it through the WebSocket.

Note that sending large files over WebSocket by reading them into array buffers may not be the most optimum way to upload them as it can quickly occupy to much memory resulting in a poor experience.

Read the full article inToptal Engineering blog 

Data Mining for Predictive Social Network Analysis

Social networks, in one form or another, have existed since people first began to interact. Indeed, put two or more people together and you have the foundation of a social network. It is therefore no surprise that, in today’s Internet-everywhere world, online social networks have become entirely ubiquitous.

Within this world of online social networks, a particularly fascinating phenomenon of the past decade has been the explosive growth of Twitter, often described as “the SMS of the Internet”. Launched in 2006, Twitter rapidly gained global popularity and has become one of the ten most visited websites in the world. As of May 2015, Twitter boasts 302 million active users who are collectively producing 500 million Tweets per day. And these numbers are continually growing.

Read the full article in Toptal Engineering blog

To Python 3 and Back Again: Is It Worth the Switch?

Python 3 has been in existence for 7 years now, yet some still prefer to use Python 2 instead of the newer version. This is a problem especially for neophytes that are approaching Python for the first time. I realized this at my previous workplace with colleagues in the exact same situation. Not only were they unaware of the differences between the two versions, they were not even aware of the version that they had installed.

Inevitably, different colleagues had installed different versions of the interpreter. That was a recipe for disaster if they would’ve then tried to blindly share the scripts between them.

This wasn’t quite their fault, on the contrary. A greater effort for documenting and raising awareness is needed to dispel that veil of FUD (fear, uncertainty and doubt) that sometimes affects our choices. This post is thus thought for them, or for those who already use Python 2 but aren’t sure about moving to the next version, maybe because they tried version 3 only at the beginning when it was less refined and support for libraries was worse.

Read the full article in the Toptal Engineering blog

An Introduction to Mocking in Python

How to Run Unit Tests Without Testing Your Patience

More often than not, the software we write directly interacts with what we would label as “dirty” services. In layman’s terms: services that are crucial to our application, but whose interactions have intended but undesired side-effects—that is, undesired in the context of an autonomous test run.

For example: perhaps we’re writing a social app and want to test out our new ‘Post to Facebook feature’, but don’t want to actually post to Facebook every time we run our test suite.

The Python unittest library includes a subpackage named unittest.mock—or if you declare it as a dependency, simply mock—which provides extremely powerful and useful means by which to mock and stub out these undesired side-effects.

mocking and unit tests in python

Note: mock is newly included in the standard library as of Python 3.3; prior distributions will have to use the Mock library downloadable via PyPI.

Fear System Calls

To give you another example, and one that we’ll run with for the rest of the article, consider system calls. It’s not difficult to see that these are prime candidates for mocking: whether you’re writing a script to eject a CD drive, a web server which removes antiquated cache files from /tmp, or a socket server which binds to a TCP port, these calls all feature undesired side-effects in the context of your unit-tests.

As a developer, you care more that your library successfully called the system function for ejecting a CD as opposed to experiencing your CD tray open every time a test is run.

As a developer, you care more that your library successfully called the system function for ejecting a CD (with the correct arguments, etc.) as opposed to actually experiencing your CD tray open every time a test is run. (Or worse, multiple times, as multiple tests reference the eject code during a single unit-test run!)

Likewise, keeping your unit-tests efficient and performant means keeping as much “slow code” out of the automated test runs, namely filesystem and network access.

For our first example, we’ll refactor a standard Python test case from original form to one using mock. We’ll demonstrate how writing a test case with mocks will make our tests smarter, faster, and able to reveal more about how the software works.

Read the full article in Toptal Engineering blog