Thursday, June 02, 2016

Python Class Attributes

I had a programming interview recently, a phone-screen in which we used a collaborative text editor.

I was asked to implement a certain API, and chose to do so in Python. Abstracting away the problem statement, let’s say I needed a class whose instances stored some data and some other_data.

I took a deep breath and started typing. After a few lines, I had something like this:

class Service(object):
data = []
def __init__(self, other_data):
self.other_data = other_data
...

My interviewer stopped me:

  • Interviewer: “That line: data = []. I don’t think that’s valid Python?”
  • Me: “I’m pretty sure it is. It’s just setting a default value for the instance attribute.”
  • Interviewer: “When does that code get executed?”
  • Me: “I’m not really sure. I’ll just fix it up to avoid confusion.”

For reference, and to give you an idea of what I was going for, here’s how I amended the code:

class Service(object):
def __init__(self, other_data):
self.data = []
self.other_data = other_data
...

As it turns out, we were both wrong. The real answer lay in understanding the distinction between class and instance attributes.

Python class attributes vs. Python instance attributes

Note: if you have an expert handle on class attributes, you can skip ahead to use cases.

Class Attributes

My interviewer was wrong in that the above code is syntactically valid.

I too was wrong in that it isn’t setting a “default value” for the instance attribute. Instead, it’s defining data as a class attribute with value [].

In my experience, class attributes are a topic that many people know something about, but few understand completely.

What’s the difference?

A class attribute is an attribute of the class (circular, I know), rather than an attribute of an instance of a class.

Let’s use an example to illustrate the difference. Here, class_var is a class attribute, and i_var is an instance attribute:

class MyClass(object):
class_var = 1
def __init__(self, i_var):
self.i_var = i_var

Note that all instances of the class have access to class_var, and that it can also be accessed as a property of the class itself:

foo = MyClass(2)
bar = MyClass(3)
foo.class_var, foo.i_var
## 1, 2
bar.class_var, bar.i_var
## 1, 3
MyClass.class_var ## <— This is key
## 1

For Java or C++ programmers, the class attribute is similar—but not identical—to the static member. We’ll see how they differ below.

Class vs. instance namespaces

To understand what’s happening here, let’s talk briefly about Python namespaces.

namespace is a mapping from names to objects, with the property that there is zero relation between names in different namespaces. They’re usually implemented as Python dictionaries, although this is abstracted away.

Depending on the context, you may need to access a namespace using dot syntax (e.g., object.name_from_objects_namespace) or as a local variable (e.g., object_from_namespace). As a concrete example:

class MyClass(object):
## No need for dot syntax
class_var = 1
def __init__(self, i_var):
self.i_var = i_var
## Need dot syntax as we've left scope of class namespace
MyClass.class_var
## 1

Python classes and instances of classes each have their own distinct namespaces represented by pre-defined attributes MyClass.__dict__ and instance_of_MyClass.__dict__, respectively.

When you try to access an attribute from an instance of a class, it first looks at its instance namespace. If it finds the attribute, it returns the associated value. If not, it then looks in the class namespace and returns the attribute (if it’s present, throwing an error otherwise). For example:

foo = MyClass(2)
## Finds i_var in foo's instance namespace
foo.i_var
## 2
## Doesn't find class_var in instance namespace…
## So look's in class namespace (MyClass.__dict__)
foo.class_var
## 1

The instance namespace takes supremacy over the class namespace: if there is an attribute with the same name in both, the instance namespace will be checked first and its value returned. Here’s a simplified version of the code (source) for attribute lookup:

def instlookup(inst, name):
## simplified algorithm...
if inst.__dict__.has_key(name):
return inst.__dict__[name]
else:
return inst.__class__.__dict__[name]

And, in visual form:

attribute lookup in visual form

Handling assignment

With this in mind, we can make sense of how class attributes handle assignment:

  • If a class attribute is set by accessing the class, it will override the value for all instances. For example:

    	foo = MyClass(2)
    	foo.class_var
    	## 1
    	MyClass.class_var = 2
    	foo.class_var
    	## 2
    	
    	

    At the namespace level… we’re setting MyClass.__dict__['class_var'] = 2. (Note: this isn’t the exact code(which would be setattr(MyClass, 'class_var', 2)) as __dict__ returns a dictproxy, an immutable wrapper that prevents direct assignment, but it helps for demonstration’s sake). Then, when we access foo.class_varclass_var has a new value in the class namespace and thus 2 is returned.

  • If a class variable is set by accessing an instance, it will override the value only for that instance. This essentially overrides the class variable and turns it into an instance variable available, intuitively, only for that instance. For example:

    	foo = MyClass(2)
    	foo.class_var
    	## 1
    	foo.class_var = 2
    	foo.class_var
    	## 2
    	MyClass.class_var
    	## 1
    	
    	

    At the namespace level… we’re adding the class_var attribute to foo.__dict__, so when we lookup foo.class_var, we return 2. Meanwhile, other instances of MyClass will not have class_var in their instance namespaces, so they continue to find class_var in MyClass.__dict__ and thus return 1.

Mutability

Quiz question: What if your class attribute has a mutable type? You can manipulate (mutilate?) the class attribute by accessing it through a particular instance and, in turn, end up manipulating the referenced object that all instances are accessing (as pointed out by Timothy Wiseman).

This is best demonstrated by example. Let’s go back to the Service I defined earlier and see how my use of a class variable could have led to problems down the road.

class Service(object):
data = []
def __init__(self, other_data):
self.other_data = other_data
...

My goal was to have the empty list ([]) as the default value for data, and for each instance of Service to have its own data that would be altered over time on an instance-by-instance basis. But in this case, we get the following behavior (recall that Service takes some argument other_data, which is arbitrary in this example):

s1 = Service(['a', 'b'])
s2 = Service(['c', 'd'])
s1.data.append(1)
s1.data
## [1]
s2.data
## [1]
s2.data.append(2)
s1.data
## [1, 2]
s2.data
## [1, 2]

This is no good—altering the class variable via one instance alters it for all the others!

At the namespace level… all instances of Service are accessing and modifying the same list in Service.__dict__ without making their own data attributes in their instance namespaces.

We could get around this using assignment; that is, instead of exploiting the list’s mutability, we could assign our Service objects to have their own lists, as follows:

s1 = Service(['a', 'b'])
s2 = Service(['c', 'd'])
s1.data = [1]
s2.data = [2]
s1.data
## [1]
s2.data
## [2]

In this case, we’re adding s1.__dict__['data'] = [1], so the original Service.__dict__['data'] remains unchanged.

Unfortunately, this requires that Service users have intimate knowledge of its variables, and is certainly prone to mistakes. In a sense, we’d be addressing the symptoms rather than the cause. We’d prefer something that was correct by construction.

My personal solution: if you’re just using a class variable to assign a default value to a would-be instance variable, don’t use mutable values. In this case, every instance of Service was going to override Service.data with its own instance attribute eventually, so using an empty list as the default led to a tiny bug that was easily overlooked. Instead of the above, we could’ve either:

  1. Stuck to instance attributes entirely, as demonstrated in the introduction.
  2. Avoided using the empty list (a mutable value) as our “default”:

    	class Service(object):
    	data = None
    	def __init__(self, other_data):
    	self.other_data = other_data
    	...
    	
    	

    Of course, we’d have to handle the None case appropriately, but that’s a small price to pay.

Like what you're reading?
Get the latest updates first.
No spam. Just great engineering and design posts.

So when would you use them?

Class attributes are tricky, but let’s look at a few cases when they would come in handy:

  1. Storing constants. As class attributes can be accessed as attributes of the class itself, it’s often nice to use them for storing Class-wide, Class-specific constants. For example:

    	class Circle(object):
    	pi = 3.14159
    	def __init__(self, radius):
    	self.radius = radius
    	def area(self):
    	return Circle.pi * self.radius * self.radius
    	Circle.pi
    	## 3.14159
    	c = Circle(10)
    	c.pi
    	## 3.14159
    	c.area()
    	## 314.159
    	
    	
  2. Defining default values. As a trivial example, we might create a bounded list (i.e., a list that can only hold a certain number of elements or fewer) and choose to have a default cap of 10 items:

    	class MyClass(object):
    	limit = 10
    	def __init__(self):
    	self.data = []
    	def item(self, i):
    	return self.data[i]
    	def add(self, e):
    	if len(self.data) >= self.limit:
    	raise Exception("Too many elements")
    	self.data.append(e)
    	MyClass.limit
    	## 10
    	
    	

    We could then create instances with their own specific limits, too, by assigning to the instance’s limitattribute.

    	foo = MyClass()
    	foo.limit = 50
    	## foo can now hold 50 elements—other instances can hold 10
    	
    	

    This only makes sense if you will want your typical instance of MyClass to hold just 10 elements or fewer—if you’re giving all of your instances different limits, then limit should be an instance variable. (Remember, though: take care when using mutable values as your defaults.)

  3. Tracking all data across all instances of a given class. This is sort of specific, but I could see a scenario in which you might want to access a piece of data related to every existing instance of a given class.

    To make the scenario more concrete, let’s say we have a Person class, and every person has a name. We want to keep track of all the names that have been used. One approach might be to iterate over the garbage collector’s list of objects, but it’s simpler to use class variables.

    Note that, in this case, names will only be accessed as a class variable, so the mutable default is acceptable.

    	
     Posted by 
     ipapuc
    
     (    Python  )
      
     :: 
    Comments
    (1) ::
    
       Permalink ::
    
       Trackbacks (0)
    
    

[Reply]

Please fix indentation.

Comment by nick (10/27/2016 02:38)

Add comment

Add comment

authimage