A Tutorial for Reverse Engineering Your Software's Private API: Hacking Your Couch

Traveling is my passion, and I’m a huge fan of Couchsurfing. Couchsurfing is a global community of travelers, where you can find a place to stay or share your own home with other travelers. On top of that, Couchsurfing helps you enjoy a genuine traveling experience while interacting with locals. I’ve been involved with the Couchsurfing community for over 3 years. I attended meetups at first, and then I was finally able to host people. What an amazing journey it was! I’ve met so many incredible people from all over the world and made lots of friends. This whole experience truly changed my life.

Reverse engineering software is fun with a tutorial and a good project idea.

I’ve hosted a lot of travelers myself, much more than I’ve actually surfed yet. While living in one of the major touristic destinations on the French Riviera, I received an enormous amount of couch requests (up to 10 a day during high season). As a freelance back-end developer, I immediately noticed that the problem with the couchsurfing.com website is that it doesn’t really handle such “high-load” cases properly. There is no information about the availability of your couch - when you receive a new couch request you can’t be sure if you are already hosting someone at that time. There should be a visual representation of your accepted and pending requests, so you can manage them better. Also, if you could make your couch availability public, you could avoid unnecessary couch requests. To better understand what I have in mind take a look at Airbnb calendar.

Lots of companies are notorious for not listening to their users. Knowing the history of Couchsurfing, I couldn’t count on them to implement this feature anytime soon. Ever since the website became a for-profit company, the community deteriorated. To better understand what I’m talking about, I suggest reading these two articles:

I knew that lot of community members would be happy to have this functionality. So, I decided to make an app to solve this problem. It turns out there is no public Couchsurfing API available. Here is the response I’ve received from their support team:

“Unfortunately we have to inform you that our API is not actually public and there are no plans at the moment to make it public.”

Breaking Into My Couch

It was time to use some of my favorite software reverse engineering techniques to break into Couchsurfing.com. I assumed that their mobile apps must use some sort of API to query the backend. So, I had to intercept the HTTP requests coming from a mobile app to the backend. For that purpose I set up a proxy in the local network, and connected my iPhone to it to intercept HTTP requests. This way, I was able to find access points of their private API and figure out their JSON payload format.

Finally I created a website which serves the purpose of helping people manage their couch requests, and show surfers a couch availability calendar. I published a link to it on the community forums (which are also quite segmented in my opinion, and it’s difficult to find information there). The reception was mostly positive, although some people didn’t like the idea that the website required couchsurfing.com credentials, which was a matter of trust really.

The website worked like this: you log in to the website with your couchsurfing.com credentials, and after a few clicks you get the html code which you can embed into your couchsurfing.com profile, and voila - you have an automatically updated calendar in your profile. Below is the screenshot of the calendar and here the articles on how I made it:

I’ve created a great feature for Couchsurfing, and I naturally assumed that they would appreciate my work - perhaps even offer me a position in their development team. I’ve sent an email to jobs(at)couchsurfing.comwith a link to the website, my resume, and a reference. A thank-you note left by one of my couchsurfing guests:

A few days later they followed up on my reverse engineering efforts. In the reply it was clear that the only thing they were concerned about was their own security, so they asked me to take down the blog posts I’ve written about the API, and eventually the website. I’ve taken down the posts immediately, as my intention was not to violate the terms of use and fish for user credentials, but rather to help the couchsurfing community. I had an impression that I was treated as a criminal, and the company focused solely the fact that my website requires user credentials.

 Read the full article at Toptal Engineering blog



Python Design Patterns: For Sleek And Fashionable Code

Python is a dynamic and flexible language. Python design patterns are a great way of harnessing its vast potential.Python’s philosophy is built on top of the idea of well thought out best practices. Python is a dynamic language (did I already said that?) and as such, already implements, or makes it easy to implement, a number of popular design patterns with a few lines of code. Some design patterns are built into Python, so we use them even without knowing. Other patterns are not needed due of the nature of the language.

For example, Factory is a structural Python design pattern aimed at creating new objects, hiding the instantiation logic from the user. But creation of objects in Python is dynamic by design, so additions like Factory are not necessary. Of course, you are free to implement it if you want to. There might be cases where it would be really useful, but they’re an exception, not the norm.

What is so good about Python’s philosophy? Let’s start with this (explore it in the Python terminal):

>>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

These might not be patterns in the traditional sense, but these are rules that define the “Pythonic” approach to programming in the most elegant and useful fashion.

We have also PEP-8 code guidelines that help structure our code. It’s a must for me, with some appropriate exceptions, of course. By the way, these exceptions are encouraged by PEP-8 itself:

“But most importantly: know when to be inconsistent – sometimes the style guide just doesn’t apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don’t hesitate to ask!”

Combine PEP-8 with The Zen of Python (also a PEP - PEP-20), and you’ll have a perfect foundation to create readable and maintainable code. Add Design Patterns and you are ready to create every kind of software system with consistency and evolvability.

Python Design Patterns

What Is A Design Pattern?

Everything starts with the Gang of Four (GOF). Do a quick online search if you are not familiar with the GOF.

Design patterns are a common way of solving well known problems. Two main principles are in the bases of the design patterns defined by the GOF:

  • Program to an interface not an implementation.
  • Favor object composition over inheritance.

Let’s take a closer look at these two principles from the perspective of Python programmers.

Program to an interface not an implementation

Think about Duck Typing. In Python we don’t like to define interfaces and program classes according these interfaces, do we? But, listen to me! This doesn’t mean we don’t think about interfaces, in fact with Duck Typing we do that all the time.

Let’s say some words about the infamous Duck Typing approach to see how it fits in this paradigm: program to an interface.

If it looks like a duck and quacks like a duck, it's a duck!

If it looks like a duck and quacks like a duck, it's a duck!

We don’t bother with the nature of the object, we don’t have to care what the object is; we just want to know if it’s able to do what we need (we are only interested in the interface of the object).

Can the object quack? So, let it quack!

try:
bird.quack()
except AttributeError:
self.lol()

Did we define an interface for our duck? No! Did we program to the interface instead of the implementation? Yes! And, I find this so nice.

As Alex Martelli points out in his well known presentation about Design Patterns in Python, “Teaching the ducks to type takes a while, but saves you a lot of work afterwards!”

Favor object composition over inheritance

Now, that’s what I call a Pythonic principle! I have created fewer classes/subclasses compared to wrapping one class (or more often, several classes) in another class.

Instead of doing this:

class User(DbObject):
pass

We can do something like this:

class User:
_persist_methods = ['get', 'save', 'delete']
def __init__(self, persister):
self._persister = persister
def __getattr__(self, attribute):
if attribute in self._persist_methods:
return getattr(self._persister, attribute)

The advantages are obvious. We can restrict what methods of the wrapped class to expose. We can inject the persister instance in runtime! For example, today it’s a relational database, but tomorrow it could be whatever, with the interface we need (again those pesky ducks).

Composition is elegant and natural to Python.

Behavioral Patterns

Behavioural Patterns involve communication between objects, how objects interact and fulfil a given task. According to GOF principles, there are a total of 11 behavioral patterns in Python: Chain of responsibility, Command, Interpreter, Iterator, Mediator, Memento, Observer, State, Strategy, Template, Visitor.

Behavioural patterns deal with inter-object communication, controlling how various objects interact and perform different tasks.

Behavioural patterns deal with inter-object communication, controlling how various objects interact and perform different tasks.

Read the whole article here by Andrei Boyanov, Toptal Freelance Developer



WSGI: The Server-Application Interface for Python

In 1993, the web was still in its infancy, with about 14 million users and a hundred websites. Pages were static but there was already a need to produce dynamic content, such as up-to-date news and data. Responding to this, Rob McCool and other contributors implemented the Common Gateway Interface (CGI) in the National Center for Supercomputing Applications (NCSA) HTTPd web server (the forerunner of Apache). This was the first web server that could serve content generated by a separate application.

Since then, the number of users on the Internet has exploded, and dynamic websites have become ubiquitous. When first learning a new language or even first learning to code, developers, soon enough, want to know about how to hook their code into the web.

Source: Toptal 



8 Essential Python Interview Questions

Source: Toptal 

 

What will be the output of the code below? Explain your answer.

def extendList(val, list=[]):
list.append(val)
return list
list1 = extendList(10)
list2 = extendList(123,[])
list3 = extendList('a')
print "list1 = %s" % list1
print "list2 = %s" % list2
print "list3 = %s" % list3

How would you modify the definition of extendList to produce the presumably desired behavior?

What will be the output of the code below? Explain your answer.

def multipliers():
return [lambda x : i * x for i in range(4)]
print [m(2) for m in multipliers()]

How would you modify the definition of multipliers to produce the presumably desired behavior?

What will be the output of the code below? Explain your answer.

class Parent(object):
x = 1
class Child1(Parent):
pass
class Child2(Parent):
pass
print Parent.x, Child1.x, Child2.x
Child1.x = 2
print Parent.x, Child1.x, Child2.x
Parent.x = 3
print Parent.x, Child1.x, Child2.x

This post originally appeared on the Toptal Engineering blog 

 

What will be the output of the code below in Python 2? Explain your answer.

def div1(x,y):
print "%s/%s = %s" % (x, y, x/y)
def div2(x,y):
print "%s//%s = %s" % (x, y, x//y)
div1(5,2)
div1(5.,2)
div2(5,2)
div2(5.,2.)

Also, how would the answer differ in Python 3 (assuming, of course, that the above print statements were converted to Python 3 syntax)?

What will be the output of the code below?

list = ['a', 'b', 'c', 'd', 'e']
print list[10:]

Consider the following code snippet:

1. list = [ [ ] ] * 5
2. list  # output?
3. list[0].append(10)
4. list  # output?
5. list[1].append(20)
6. list  # output?
7. list.append(30)
8. list  # output?

What will be the ouput of lines 2, 4, 6, and 8? Explain your answer.

Given a list of N numbers, use a single list comprehension to produce a new list that only contains those values that are:
(a) even numbers, and
(b) from elements in the original list that had even indices

For example, if list[2] contains a value that is even, that value should be included in the new list, since it is also at an even index (i.e., 2) in the original list. However, if list[3] contains an even number, that number should not be included in the new list since it is at an odd index (i.e., 3) in the original list.

Given the following subclass of dictionary:

class DefaultDict(dict):
def __missing__(self, key):
return []

Will the code below work? Why or why not?

d = DefaultDict()
d['florp'] = 127
 
 


Python Multithreading Tutorial: Concurrency and Parallelism

Discussions criticizing Python often talk about how it is difficult to use Python for multithreaded work, pointing fingers at what is known as the global interpreter lock (affectionately referred to as the “GIL”) that prevents multiple threads of Python code from running simultaneously. Due to this, the threading module doesn’t quite behave the way you would expect it to if you’re not a Python developer and you are coming from other languages such as C++ or Java. It must be made clear that one can still write code in Python that runs concurrently or in parallel and make a stark difference resulting performance, as long as certain things are taken into consideration. If you haven’t read it yet, I suggest you take a look at Eqbal Quran’s article on concurrency and parallelism in Ruby here on the Toptal blog.

In this Python concurrency tutorial, we will write a small Python script to download the top popular images from Imgur. We will start with a version that downloads images sequentially, or one at a time. As a prerequisite, you will have to register an application on Imgur. If you do not have an Imgur account already, please create one first.

The scripts in this tutorial has been tested with Python 3.4.2. With some changes, they should also run with Python 2 - urllib is what has changed the most between these two versions of Python.

Getting Started with Multithreading in Python

Let us start by creating a Python module, named “download.py”. This file will contain all the functions necessary to fetch the list of images and download them. We will split these functionalities into three separate functions:

  • get_links
  • download_link
  • setup_download_dir

The third function, “setup_download_dir”, will be used to create a download destination directory if it doesn’t already exist.

Imgur’s API requires HTTP requests to bear the “Authorization” header with the client ID. You can find this client ID from the dashboard of the application that you have registered on Imgur, and the response will be JSON encoded. We can use Python’s standard JSON library to decode it. Downloading the image is an even simpler task, as all you have to do is fetch the image by its URL and write it to a file.

Python multithreading

This is what the script looks like:

import json
import logging
import os
from pathlib import Path
from urllib.request import urlopen, Request
logger = logging.getLogger(__name__)
def get_links(client_id):
headers = {'Authorization': 'Client-ID {}'.format(client_id)}
req = Request('https://api.imgur.com/3/gallery/', headers=headers, method='GET')
with urlopen(req) as resp:
data = json.loads(resp.readall().decode('utf-8'))
return map(lambda item: item['link'], data['data'])
def download_link(directory, link):
logger.info('Downloading %s', link)
download_path = directory / os.path.basename(link)
with urlopen(link) as image, download_path.open('wb') as f:
f.write(image.readall())
def setup_download_dir():
download_dir = Path('images')
if not download_dir.exists():
download_dir.mkdir()
return download_dir

Next, we will need to write a module that will use these functions to download the images, one by one. We will name this “single.py”. This will contain the main function of our first, naive version of the Imgur image downloader. The module will retrieve the Imgur client ID in the environment variable “IMGUR_CLIENT_ID”. It will invoke the “setup_download_dir” to create the download destination directory. Finally, it will fetch a list of images using the get_links function, filter out all GIF and album URLs, and then use “download_link” to download and save each of those images to the disk. Here is what “single.py” looks like:

import logging
import os
from time import time
from download import setup_download_dir, get_links, download_link
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logging.getLogger('requests').setLevel(logging.CRITICAL)
logger = logging.getLogger(__name__)
def main():
ts = time()
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = [l for l in get_links(client_id) if l.endswith('.jpg')]
for link in links:
download_link(download_dir, link)
print('Took {}s'.format(time() - ts))
if __name__ == '__main__':
main()

On my laptop, this script took 19.4 seconds to download 91 images. Please do note that these numbers may vary based on the network you are on. 19.4 seconds isn’t terribly long, but what if we wanted to download more pictures? Perhaps 900 images, instead of 90. With an average of 0.2 seconds per picture, 900 images would take approximately 3 minutes. For 9000 pictures it would take 30 minutes. The good news is that by introducing concurrency or parallelism, we can speed this up dramatically.

All subsequent code examples will only show import statements that are new and specific to those. For convenience, all of these Python scripts can be found in this GitHub repository.

Using Threads for Concurrency and Parallelism

Threading is one of the most well known approaches to attaining Python concurrency and parallelism. Threading is a feature usually provided by the operating system. Threads are lighter than processes, and share the same memory space.

Threading - Python concurrency and parallelism

In our Python thread tutorial, we will write a new module to replace “single.py”. This module will create a pool of 8 threads, making a total of 9 threads including the main thread. I chose 8 worker threads, because my computer has 8 CPU cores and one worker thread per core seemed a good number for how many threads to run at once. In practice, this number is chosen much more carefully based on other factors, such as other applications and services running on the same machine.

This is almost the same as the previous one, with the exception that we now have a new class, DownloadWorker, that is a descendent of the Thread class. The run method has been overridden, which runs an infinite loop. On every iteration, it calls “self.queue.get()” to try and fetch an URL to from a thread-safe queue. It blocks until there is an item in the queue for the worker to process. Once the worker receives an item from the queue, it then calls the same “download_link” method that was used in the previous script to download the image to the images directory. After the download is finished, the worker signals the queue that that task is done. This is very important, because the Queue keeps track of how many tasks were enqueued. The call to “queue.join()” would block the main thread forever if the workers did not signal that they completed a task.

from queue import Queue
from threading import Thread
class DownloadWorker(Thread):
def __init__(self, queue):
Thread.__init__(self)
self.queue = queue
def run(self):
while True:
# Get the work from the queue and expand the tuple
directory, link = self.queue.get()
download_link(directory, link)
self.queue.task_done()
def main():
ts = time()
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = [l for l in get_links(client_id) if l.endswith('.jpg')]
# Create a queue to communicate with the worker threads
queue = Queue()
# Create 8 worker threads
for x in range(8):
worker = DownloadWorker(queue)
# Setting daemon to True will let the main thread exit even though the workers are blocking
worker.daemon = True
worker.start()
# Put the tasks into the queue as a tuple
for link in links:
logger.info('Queueing {}'.format(link))
queue.put((download_dir, link))
# Causes the main thread to wait for the queue to finish processing all the tasks
queue.join()
print('Took {}'.format(time() - ts))

Running this script on the same machine used earlier results in a download time of 4.1 seconds! Thats 4.7 times faster than the previous example. While this is much faster, it is worth mentioning that only one thread was executing at a time throughout this process due to the GIL. Therefore, this code is concurrent but not parallel. The reason it is still faster is because this is an IO bound task. The processor is hardly breaking a sweat while downloading these images, and the majority of the time is spent waiting for the network. This is why threading can provide a large speed increase. The processor can switch between the threads whenever one of them is ready to do some work. Using the threading module in Python or any other interpreted language with a GIL can actually result in reduced performance. If your code is performing a CPU bound task, such as decompressing gzip files, using the threading module will result in a slower execution time. For CPU bound tasks and truly parallel execution, we can use the multiprocessing module.

While the de facto reference Python implementation - CPython - has a GIL, this is not true of all Python implementations. For example, IronPython, a Python implementation using the .NET framework does not have a GIL, and neither does Jython, the Java based implementation. You can find a list of working Python implementations here.

Spawning Multiple Processes

The multiprocessing module is easier to drop in than the threading module, as we don’t need to add a class like the threading example. The only changes we need to make are in the main function.

multiprocessing module

To use multiple processes we create a multiprocessing Pool. With the map method it provides, we will pass the list of URLs to the pool, which in turn will spawn 8 new processes and use each one to download the images in parallel. This is true parallelism, but it comes with a cost. The entire memory of the script is copied into each subprocess that is spawned. In this simple example it isn’t a big deal, but it can easily become serious overhead for non-trivial programs.

from functools import partial
from multiprocessing.pool import Pool
def main():
ts = time()
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = [l for l in get_links(client_id) if l.endswith('.jpg')]
download = partial(download_link, download_dir)
with Pool(8) as p:
p.map(download, links)
print('Took {}s'.format(time() - ts))

Like what you're reading?
Get the latest updates first.
No spam. Just great engineering and design posts.
  •  

Distributing to Multiple Workers

While the threading and multiprocessing modules are great for scripts that are running on your personal computer, what should you do if you want the work to be done on a different machine, or you need to scale up to more than the CPU on one machine can handle? A great use case for this is long-running back-end tasks for web applications. If you have some long running tasks, you don’t want to spin up a bunch of subprocesses or threads on the same machine that need to be running the rest of your application code. This will degrade the performance of your application for all of your users. What would be great is to be able to run these jobs on another machine, or many other machines.

A great Python library for this task is RQ, a very simple yet powerful library. You first enqueue a function and its arguments using the library. This pickles the function call representation, which is then appended to a Redis list. Enqueueing the job is the first step, but will not do anything yet. We also need at least one worker to listen on that job queue.

RQ python library

The first step is to install and run a Redis server on your computer, or have access to a running Redis server. After that, there are only a few small changes made to the existing code. We first create an instance of an RQ Queue and pass it an instance of a Redis server from the redis-py library. Then, instead of just calling our “download_link” method, we call “q.enqueue(download_link, download_dir, link)”. The enqueue method takes a function as its first argument, then any other arguments or keyword arguments are passed along to that function when the job is actually executed.

One last step we need to do is to start up some workers. RQ provides a handy script to run workers on the default queue. Just run “rqworker” in a terminal window and it will start a worker listening on the default queue. Please make sure your current working directory is the same as where the scripts reside in. If you want to listen to a different queue, you can run “rqworker queue_name” and it will listen to that named queue. The great thing about RQ is that as long as you can connect to Redis, you can run as many workers as you like on as many different machines as you like; therefore, it is very easy to scale up as your application grows. Here is the source for the RQ version:

from redis import Redis
from rq import Queue
def main():
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = [l for l in get_links(client_id) if l.endswith('.jpg')]
q = Queue(connection=Redis(host='localhost', port=6379))
for link in links:
q.enqueue(download_link, download_dir, link)

However, RQ is not the only Python job queue solution. RQ is easy to use and covers simple use cases extremely well, but if more advanced options are required, other job queue solutions (such as Celery) can be used.

  • Last comment in
  •  0 Comments


  • Avoid the 10 Most Common Mistakes That Python Programmers Make

    About Python

    Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components or services. Python supports modules and packages, thereby encouraging program modularity and code reuse.

    About this article

    Python’s simple, easy-to-learn syntax can mislead Python developers – especially those who are newer to the language – into missing some of its subtleties and underestimating the power of the diverse Python language.

    With that in mind, this article presents a “top 10” list of somewhat subtle, harder-to-catch mistakes that can bite even some more advanced Python developers in the rear.

    This Python found himself caught in an advanced Python programming mistakes.

    (Note: This article is intended for a more advanced audience than Common Mistakes of Python Programmers, which is geared more toward those who are newer to the language.)

    Common Mistake #1: Misusing expressions as defaults for function arguments

    Python allows you to specify that a function argument is optional by providing a default value for it. While this is a great feature of the language, it can lead to some confusion when the default value is mutable. For example, consider this Python function definition:

    >>> def foo(bar=[]):        # bar is optional and defaults to [] if not specified
    ...    bar.append("baz")    # but this line could be problematic, as we'll see...
    ...    return bar
    
    

    A common mistake is to think that the optional argument will be set to the specified default expression each time the function is called without supplying a value for the optional argument. In the above code, for example, one might expect that calling foo() repeatedly (i.e., without specifying a bar argument) would always return 'baz', since the assumption would be that each time foo() is called (without a bar argument specified) bar is set to [] (i.e., a new empty list).

    But let’s look at what actually happens when you do this:

    >>> foo()
    ["baz"]
    >>> foo()
    ["baz", "baz"]
    >>> foo()
    ["baz", "baz", "baz"]
    
    

    Huh? Why did it keep appending the default value of "baz" to an existing list each time foo() was called, rather than creating a new list each time?

    The more advanced Python programming answer is that the default value for a function argument is only evaluated once, at the time that the function is defined. Thus, the bar argument is initialized to its default (i.e., an empty list) only when foo() is first defined, but then calls to foo() (i.e., without a bar argument specified) will continue to use the same list to which bar was originally initialized.

    FYI, a common workaround for this is as follows:

    >>> def foo(bar=None):
    ...    if bar is None:		# or if not bar:
    ...        bar = []
    ...    bar.append("baz")
    ...    return bar
    ...
    >>> foo()
    ["baz"]
    >>> foo()
    ["baz"]
    >>> foo()
    ["baz"]
    
    

    Common Mistake #2: Using class variables incorrectly

    Consider the following example:

    >>> class A(object):
    ...     x = 1
    ...
    >>> class B(A):
    ...     pass
    ...
    >>> class C(A):
    ...     pass
    ...
    >>> print A.x, B.x, C.x
    1 1 1
    
    

    Makes sense.

    >>> B.x = 2
    >>> print A.x, B.x, C.x
    1 2 1
    
    

    Yup, again as expected.

    >>> A.x = 3
    >>> print A.x, B.x, C.x
    3 2 3
    
    

    What the $%#!&?? We only changed A.x. Why did C.x change too?

    In Python, class variables are internally handled as dictionaries and follow what is often referred to as Method Resolution Order (MRO). So in the above code, since the attribute x is not found in class C, it will be looked up in its base classes (only A in the above example, although Python supports multiple inheritance). In other words, C doesn’t have its own x property, independent of A. Thus, references to C.x are in fact references to A.x. This causes a Python problem unless it’s handled properly. Learn more aout class attributes in Python.

    Common Mistake #3: Specifying parameters incorrectly for an exception block

    Suppose you have the following code:

    >>> try:
    ...     l = ["a", "b"]
    ...     int(l[2])
    ... except ValueError, IndexError:  # To catch both exceptions, right?
    ...     pass
    ...
    Traceback (most recent call last):
    File "<stdin>", line 3, in <module>
    IndexError: list index out of range
    
    

    The problem here is that the except statement does not take a list of exceptions specified in this manner. Rather, In Python 2.x, the syntax except Exception, e is used to bind the exception to the optional second parameter specified (in this case e), in order to make it available for further inspection. As a result, in the above code, the IndexError exception is not being caught by the except statement; rather, the exception instead ends up being bound to a parameter named IndexError.

    The proper way to catch multiple exceptions in an except statement is to specify the first parameter as a tuple containing all exceptions to be caught. Also, for maximum portability, use the as keyword, since that syntax is supported by both Python 2 and Python 3:

    >>> try:
    ...     l = ["a", "b"]
    ...     int(l[2])
    ... except (ValueError, IndexError) as e:  
    ...     pass
    ...
    >>>
    
    

    Common Mistake #4: Misunderstanding Python scope rules

    Python scope resolution is based on what is known as the LEGB rule, which is shorthand for Local, Enclosing, Global, Built-in. Seems straightforward enough, right? Well, actually, there are some subtleties to the way this works in Python, which brings us to the common more advanced Python programming problem below. Consider the following:

    >>> x = 10
    >>> def foo():
    ...     x += 1
    ...     print x
    ...
    >>> foo()
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 2, in foo
    UnboundLocalError: local variable 'x' referenced before assignment
    
    

    What’s the problem?

    The above error occurs because, when you make an assignment to a variable in a scope, that variable is automatically considered by Python to be local to that scope and shadows any similarly named variable in any outer scope.

    Many are thereby surprised to get an UnboundLocalError in previously working code when it is modified by adding an assignment statement somewhere in the body of a function. (You can read more about this here.)

    It is particularly common for this to trip up developers when using lists. Consider the following example:

    >>> lst = [1, 2, 3]
    >>> def foo1():
    ...     lst.append(5)   # This works ok...
    ...
    >>> foo1()
    >>> lst
    [1, 2, 3, 5]
    >>> lst = [1, 2, 3]
    >>> def foo2():
    ...     lst += [5]      # ... but this bombs!
    ...
    >>> foo2()
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 2, in foo
    UnboundLocalError: local variable 'lst' referenced before assignment
    
    

    Huh? Why did foo2 bomb while foo1 ran fine?

    The answer is the same as in the prior example problem, but is admittedly more subtle.  foo1 is not making an assignment to lst, whereas foo2 is. Remembering that lst += [5] is really just shorthand for lst = lst + [5], we see that we are attempting to assign a value to lst (therefore presumed by Python to be in the local scope). However, the value we are looking to assign to lst is based on lst itself (again, now presumed to be in the local scope), which has not yet been defined. Boom.

    Common Mistake #5: Modifying a list while iterating over it

    The problem with the following code should be fairly obvious:

    >>> odd = lambda x : bool(x % 2)
    >>> numbers = [n for n in range(10)]
    >>> for i in range(len(numbers)):
    ...     if odd(numbers[i]):
    ...         del numbers[i]  # BAD: Deleting item from a list while iterating over it
    ...
    Traceback (most recent call last):
    File "<stdin>", line 2, in <module>
    IndexError: list index out of range
    
    

    Deleting an item from a list or array while iterating over it is a Python problem that is well known to any experienced software developer. But while the example above may be fairly obvious, even advanced developers can be unintentionally bitten by this in code that is much more complex.

    Fortunately, Python incorporates a number of elegant programming paradigms which, when used properly, can result in significantly simplified and streamlined code. A side benefit of this is that simpler code is less likely to be bitten by the accidental-deletion-of-a-list-item-while-iterating-over-it bug. One such paradigm is that of list comprehensions. Moreover, list comprehensions are particularly useful for avoiding this specific problem, as shown by this alternate implementation of the above code which works perfectly:

    >>> odd = lambda x : bool(x % 2)
    >>> numbers = [n for n in range(10)]
    >>> numbers[:] = [n for n in numbers if not odd(n)]  # ahh, the beauty of it all
    >>> numbers
    [0, 2, 4, 6, 8]
    
    

    Common Mistake #6: Confusing how Python binds variables in closures

    Considering the following example:

    	>>> def create_multipliers():
    	...     return [lambda x : i * x for i in range(5)]
    	>>> for multiplier in create_multipliers():
    	...     print multiplier(2)
    	...
    	
    	

    You might expect the following output:

    	0
    	2
    	4
    	6
    	8
    	
    	

    But you actually get:

    	8
    	8
    	8
    	8
    	8
    	
    	

    Surprise!

    This happens due to Python’s late binding behavior which says that the values of variables used in closures are looked up at the time the inner function is called. So in the above code, whenever any of the returned functions are called, the value of i is looked up in the surrounding scope at the time it is called (and by then, the loop has completed, so i has already been assigned its final value of 4).

    The solution to this common Python problem is a bit of a hack:

    	>>> def create_multipliers():
    	...     return [lambda x, i=i : i * x for i in range(5)]
    	...
    	>>> for multiplier in create_multipliers():
    	...     print multiplier(2)
    	...
    	0
    	2
    	4
    	6
    	8
    	
    	

    Voilà! We are taking advantage of default arguments here to generate anonymous functions in order to achieve the desired behavior. Some would call this elegant. Some would call it subtle. Some hate it. But if you’re a Python developer, it’s important to understand in any case.

    Common Mistake #7: Creating circular module dependencies

    Let’s say you have two files, a.py and b.py, each of which imports the other, as follows:

    In a.py:

    	import b
    	def f():
    	return b.x
    	print f()
    	
    	

    And in b.py:

    	import a
    	x = 1
    	def g():
    	print a.f()
    	
    	

    First, let’s try importing a.py:

    	>>> import a
    	1
    	
    	

    Worked just fine. Perhaps that surprises you. After all, we do have a circular import here which presumably should be a problem, shouldn’t it?

    The answer is that the mere presence of a circular import is not in and of itself a problem in Python. If a module has already been imported, Python is smart enough not to try to re-import it. However, depending on the point at which each module is attempting to access functions or variables defined in the other, you may indeed run into problems.

    So returning to our example, when we imported a.py, it had no problem importing b.py, since b.py does not require anything from a.py to be defined at the time it is imported. The only reference in b.py to a is the call to 

  • Last comment in
  •  0 Comments


  • An Introduction to Mocking in Python

    How to Run Unit Tests Without Testing Your Patience

    More often than not, the software we write directly interacts with what we would label as “dirty” services. In layman’s terms: services that are crucial to our application, but whose interactions have intended but undesired side-effects—that is, undesired in the context of an autonomous test run.

    For example: perhaps we’re writing a social app and want to test out our new ‘Post to Facebook feature’, but don’t want to actually post to Facebook every time we run our test suite.

    The Python unittest library includes a subpackage named unittest.mock—or if you declare it as a dependency, simply mock—which provides extremely powerful and useful means by which to mock and stub out these undesired side-effects.

     

    Note: mock is newly included in the standard library as of Python 3.3; prior distributions will have to use the Mock library downloadable via PyPI.

    Fear System Calls

    To give you another example, and one that we’ll run with for the rest of the article, consider system calls. It’s not difficult to see that these are prime candidates for mocking: whether you’re writing a script to eject a CD drive, a web server which removes antiquated cache files from /tmp, or a socket server which binds to a TCP port, these calls all feature undesired side-effects in the context of your unit-tests.

    As a developer, you care more that your library successfully called the system function for ejecting a CD as opposed to experiencing your CD tray open every time a test is run.

    As a developer, you care more that your library successfully called the system function for ejecting a CD (with the correct arguments, etc.) as opposed to actually experiencing your CD tray open every time a test is run. (Or worse, multiple times, as multiple tests reference the eject code during a single unit-test run!)

    Likewise, keeping your unit-tests efficient and performant means keeping as much “slow code” out of the automated test runs, namely filesystem and network access.

    For our first example, we’ll refactor a standard Python test case from original form to one using mock. We’ll demonstrate how writing a test case with mocks will make our tests smarter, faster, and able to reveal more about how the software works.

    A Simple Delete Function

    We all need to delete files from our filesystem from time to time, so let’s write a function in Python which will make it a bit easier for our scripts to do so.

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    import os
    def rm(filename):
    os.remove(filename)
    
    

    Obviously, our rm method at this point in time doesn’t provide much more than the underlying os.removemethod, but our codebase will improve, allowing us to add more functionality here.

    Let’s write a traditional test case, i.e., without mocks:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    from mymodule import rm
    import os.path
    import tempfile
    import unittest
    class RmTestCase(unittest.TestCase):
    tmpfilepath = os.path.join(tempfile.gettempdir(), "tmp-testfile")
    def setUp(self):
    with open(self.tmpfilepath, "wb") as f:
    f.write("Delete me!")
    def test_rm(self):
    # remove the file
    rm(self.tmpfilepath)
    # test that it was actually removed
    self.assertFalse(os.path.isfile(self.tmpfilepath), "Failed to remove the file.")
    
    

    Our test case is pretty simple, but every time it is run, a temporary file is created and then deleted. Additionally, we have no way of testing whether our rm method properly passes the argument down to the os.remove call. We can assume that it does based on the test above, but much is left to be desired.

    Refactoring with Mocks

    Let’s refactor our test case using mock:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    from mymodule import rm
    import mock
    import unittest
    class RmTestCase(unittest.TestCase):
    @mock.patch('mymodule.os')
    def test_rm(self, mock_os):
    rm("any path")
    # test that rm called os.remove with the right parameters
    mock_os.remove.assert_called_with("any path")
    
    

    With these refactors, we have fundamentally changed the way that the test operates. Now, we have an insider, an object we can use to verify the functionality of another.

    Potential Pitfalls

    One of the first things that should stick out is that we’re using the mock.patch method decorator to mock an object located at mymodule.os, and injecting that mock into our test case method. Wouldn’t it make more sense to just mock os itself, rather than the reference to it at mymodule.os?

    Well, Python is somewhat of a sneaky snake when it comes to imports and managing modules. At runtime, the mymodule module has its own os which is imported into its own local scope in the module. Thus, if we mock os, we won’t see the effects of the mock in the mymodule module.

    The mantra to keep repeating is this:

    Mock an item where it is used, not where it came from.

    If you need to mock the tempfile module for myproject.app.MyElaborateClass, you probably need to apply the mock to myproject.app.tempfile, as each module keeps its own imports.

    With that pitfall out of the way, let’s keep mocking.

    Adding Validation to ‘rm’

    The rm method defined earlier is quite oversimplified. We’d like to have it validate that a path exists and is a file before just blindly attempting to remove it. Let’s refactor rm to be a bit smarter:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    import os
    import os.path
    def rm(filename):
    if os.path.isfile(filename):
    os.remove(filename)
    
    

    Great. Now, let’s adjust our test case to keep coverage up.

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    from mymodule import rm
    import mock
    import unittest
    class RmTestCase(unittest.TestCase):
    @mock.patch('mymodule.os.path')
    @mock.patch('mymodule.os')
    def test_rm(self, mock_os, mock_path):
    # set up the mock
    mock_path.isfile.return_value = False
    rm("any path")
    # test that the remove call was NOT called.
    self.assertFalse(mock_os.remove.called, "Failed to not remove the file if not present.")
    # make the file 'exist'
    mock_path.isfile.return_value = True
    rm("any path")
    mock_os.remove.assert_called_with("any path")
    
    

    Our testing paradigm has completely changed. We now can verify and validate internal functionality of methods without any side-effects.

    File-Removal as a Service

    So far, we’ve only been working with supplying mocks for functions, but not for methods on objects or cases where mocking is necessary for sending parameters. Let’s cover object methods first.

    We’ll begin with a refactor of the rm method into a service class. There really isn’t a justifiable need, per se, to encapsulate such a simple function into an object, but it will at the very least help us demonstrate key concepts in mock. Let’s refactor:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    import os
    import os.path
    class RemovalService(object):
    """A service for removing objects from the filesystem."""
    def rm(filename):
    if os.path.isfile(filename):
    os.remove(filename)
    
    

    You’ll notice that not much has changed in our test case:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    from mymodule import RemovalService
    import mock
    import unittest
    class RemovalServiceTestCase(unittest.TestCase):
    @mock.patch('mymodule.os.path')
    @mock.patch('mymodule.os')
    def test_rm(self, mock_os, mock_path):
    # instantiate our service
    reference = RemovalService()
    # set up the mock
    mock_path.isfile.return_value = False
    reference.rm("any path")
    # test that the remove call was NOT called.
    self.assertFalse(mock_os.remove.called, "Failed to not remove the file if not present.")
    # make the file 'exist'
    mock_path.isfile.return_value = True
    reference.rm("any path")
    mock_os.remove.assert_called_with("any path")
    
    

    Great, so we now know that the RemovalService works as planned. Let’s create another service which declares it as a dependency:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    import os
    import os.path
    class RemovalService(object):
    """A service for removing objects from the filesystem."""
    def rm(self, filename):
    if os.path.isfile(filename):
    os.remove(filename)
    class UploadService(object):
    def __init__(self, removal_service):
    self.removal_service = removal_service
    def upload_complete(self, filename):
    self.removal_service.rm(filename)
    
    

    Since we already have test coverage on the RemovalService, we’re not going to validate internal functionality of the rm method in our tests of UploadService. Rather, we’ll simply test (without side-effects, of course) that UploadService calls the RemovalService.rm method, which we know “just works™” from our previous test case.

    There are two ways to go about this:

    1. Mock out the RemovalService.rm method itself.
    2. Supply a mocked instance in the constructor of UploadService.

    As both methods are often important in unit-testing, we’ll review both.

    Option 1: Mocking Instance Methods

    The mock library has a special method decorator for mocking object instance methods and properties, the @mock.patch.object decorator:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    from mymodule import RemovalService, UploadService
    import mock
    import unittest
    class RemovalServiceTestCase(unittest.TestCase):
    @mock.patch('mymodule.os.path')
    @mock.patch('mymodule.os')
    def test_rm(self, mock_os, mock_path):
    # instantiate our service
    reference = RemovalService()
    # set up the mock
    mock_path.isfile.return_value = False
    reference.rm("any path")
    # test that the remove call was NOT called.
    self.assertFalse(mock_os.remove.called, "Failed to not remove the file if not present.")
    # make the file 'exist'
    mock_path.isfile.return_value = True
    reference.rm("any path")
    mock_os.remove.assert_called_with("any path")
    class UploadServiceTestCase(unittest.TestCase):
    @mock.patch.object(RemovalService, 'rm')
    def test_upload_complete(self, mock_rm):
    # build our dependencies
    removal_service = RemovalService()
    reference = UploadService(removal_service)
    # call upload_complete, which should, in turn, call `rm`:
    reference.upload_complete("my uploaded file")
    # check that it called the rm method of any RemovalService
    mock_rm.assert_called_with("my uploaded file")
    # check that it called the rm method of _our_ removal_service
    removal_service.rm.assert_called_with("my uploaded file")
    
    

    Great! We’ve validated that the UploadService successfully calls our instance’s rm method. Notice anything interesting in there? The patching mechanism actually replaced the rm method of all RemovalServiceinstances in our test method. That means that we can actually inspect the instances themselves. If you want to see more, try dropping in a breakpoint in your mocking code to get a good feel for how the patching mechanism works.

    Pitfall: Decorator Order

    When using multiple decorators on your test methods, order is important, and it’s kind of confusing. Basically, when mapping decorators to method parameters, work backwards. Consider this example:

        @mock.patch('mymodule.sys')
    @mock.patch('mymodule.os')
    @mock.patch('mymodule.os.path')
    def test_something(self, mock_os_path, mock_os, mock_sys):
    pass
    
    

    Notice how our parameters are matched to the reverse order of the decorators? That’s partly because of the way that Python works. With multiple method decorators, here’s the order of execution in pseudocode:

    patch_sys(patch_os(patch_os_path(test_something)))
    
    

    Since the patch to sys is the outermost patch, it will be executed last, making it the last parameter in the actual test method arguments. Take note of this well and use a debugger when running your tests to make sure that the right parameters are being injected in the right order.

    Option 2: Creating Mock Instances

    Instead of mocking the specific instance method, we could instead just supply a mocked instance to UploadService with its constructor. I prefer option 1 above, as it’s a lot more precise, but there are many cases where option 2 might be efficient or necessary. Let’s refactor our test again:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    from mymodule import RemovalService, UploadService
    import mock
    import unittest
    class RemovalServiceTestCase(unittest.TestCase):
    @mock.patch('mymodule.os.path')
    @mock.patch('mymodule.os')
    def test_rm(self, mock_os, mock_path):
    # instantiate our service
    reference = RemovalService()
    # set up the mock
    mock_path.isfile.return_value = False
    reference.rm("any path")
    # test that the remove call was NOT called.
    self.assertFalse(mock_os.remove.called, "Failed to not remove the file if not present.")
    # make the file 'exist'
    mock_path.isfile.return_value = True
    reference.rm("any path")
    mock_os.remove.assert_called_with("any path")
    class UploadServiceTestCase(unittest.TestCase):
    def test_upload_complete(self, mock_rm):
    # build our dependencies
    mock_removal_service = mock.create_autospec(RemovalService)
    reference = UploadService(mock_removal_service)
    # call upload_complete, which should, in turn, call `rm`:
    reference.upload_complete("my uploaded file")
    # test that it called the rm method
    mock_removal_service.rm.assert_called_with("my uploaded file")
    
    

    In this example, we haven’t even had to patch any functionality, we simply create an auto-spec for the RemovalService class, and then inject this instance into our UploadService to validate the functionality.

    The mock.create_autospec method creates a functionally equivalent instance to the provided class. What this means, practically speaking, is that when the returned instance is interacted with, it will raise exceptions if used in illegal ways. More specifically, if a method is called with the wrong number of arguments, an exception will be raised. This is extremely important as refactors happen. As a library changes, tests break and that is expected. Without using an auto-spec, our tests will still pass even though the underlying implementation is broken.

    Pitfall: The mock.Mock and mock.MagicMock Classes

    The mock library also includes two important classes upon which most of the internal functionality is built upon: [mock.Mock](http://www.voidspace.org.uk/python/mock/mock.html) and mock.MagicMock. When given a choice to use a mock.Mock instance, a mock.MagicMock instance, or an auto-spec, always favor using an auto-spec, as it helps keep your tests sane for future changes. This is because mock.Mock and mock.MagicMockaccept all method calls and property assignments regardless of the underlying API. Consider the following use case:

    class Target(object):
    def apply(value):
    return value
    def method(target, value):
    return target.apply(value)
    
    

    We can test this with a mock.Mock instance like this:

    class MethodTestCase(unittest.TestCase):
    def test_method(self):
    target = mock.Mock()
    method(target, "value")
    target.apply.assert_called_with("value")
    
    

    This logic seems sane, but let’s modify the Target.apply method to take more parameters:

    class Target(object):
    def apply(value, are_you_sure):
    if are_you_sure:
    return value
    else:
    return None
    
    

    Re-run your test, and you’ll find that it still passes. That’s because it isn’t built against your actual API. This is why you should always use the create_autospec method and the 




    Copyright(c) 2017 - PythonBlogs.com
    By using this website, you signify your acceptance of Terms and Conditions and Privacy Policy
    All rights reserved