Underscore in Python

By: Cam Wohlfeil
Published: 2018-09-27 1130 EDT
Modified: 2018-11-01 1130 EDT
Category: Programming
Tags: python

The underscore in Python has implicit (by convention) and explicit (enforced by the interpreter) meanings. While there might be more uses that I don't know of, I'll explain the five I do know.

Implicit
- Single Leading Underscore: _var
- Single Trailing Underscore: var_
- Single Underscore: _
Explicit
- Double Leading Underscore: __var
- Double Leading and Trailing Underscore: __var__

Single Leading Underscore

This is a purely implicit convention, defined by PEP-8, meaning that the variable or method prefixed by a single underscore is for internal use only. Python does not have separation of public and private like many other object-oriented languages, so the language does not enforce this.

Note that there is one exception: wildcard imports (i.e. from package import *) will skip these, however wildcard imports are already bad practice.

Single Trailing Underscore

As previously, this is just an implicit convention, also defined in PEP-8, and is used in cases where the most appropriate name is already taken by a keyword, like class.

Single Underscore

Single underscore is another implicit convention used for a variable that is temporary or insignificant. Commonly, I use it when iterating:

for _ in iterable:
    print(_)

It can also be used when unpacking expressions to throw away the variables you don't care about.

animal = ('1', 'cat', 5, 'assorted')
id, type, _, _ = animal

print(id)   # 1
print(type) # cat
print(_)    # assorted, the last value assigned to _

In many Python REPLs (like Jupyter/iPython notebook), underscore is a special variable used to represent the result of the last expression. This can be useful for interacting with objects without assigning them a name first.

>>> list()
[]
>>> _.append(1)
>>> _.append(2)
>>> _.append(3)
>>> _
[1, 2, 3]

Double Leading Underscore

Double underscores, also known as dunders, are explicit in Python, because class attributes starting with double underscore are rewritten by the interpretor to prevent name conflicts (i.e. name mangling) and to prevent the name being overridden in subclasses. Some examples:

class Test:
    def __init__(self):
        '''Basic class to test dunder variables and methods.'''
        self.foo = 1
        self._bar = 2
        self.__baz = 3


t = Test()
dir(t)
'''
['_Test__baz', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__',    '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_bar', 'foo']
'''

If you didn't already know, when called on an object dir lists all of it's attributes. In addition to the ones we created, there's a bunch of dunder methods with both leading and trailing underscores that we didn't specify. All objects in Python have these methods by default, which will be important for the next section. What these do and why they are included is beyond the scope of this post.

Ignoring the dunder methods, we can see that foo and _bar are left alone, but __baz is mangled with the class name.

class ExtendTest(Test):
    def __init__(self):
        '''Extend Test to see what happens to the names'''
        super().__init__()
        self.foo = 'overridden'
        self._bar = 'overridden'
        self.__baz = 'overridden'


t2 = ExtendedTest()
dir(t2)
'''
['_ExtendedTest__baz', '_Test__baz', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_bar', 'foo', 'get_vars']
 '''
# Same as before, except both versions of __baz are here
print(t2.foo)       # 'overridden'
print(t2._bar)      # 'overridden'
print(t2.__baz) # AttributeError: "'ExtendedTest' object has no attribute '__baz'")
print(t2._ExtendedTest__baz) # 'overridden'
print(t2._Test__baz) # 42

Since the class can be referenced with self, name mangling is transparent.

class ManglingTest:
    def __init__(self):
        self.__mangled = 'hello'

    def get_mangled(self):
        return self.__mangled

print(ManglingTest().get_mangled()) # 'hello'
print(ManglingTest().__mangled)     # AttributeError: "'ManglingTest' object has no attribute '__mangled'"


class MangledMethod:
    def __method(self):
        '''Works with methods too.'''
        return 42

    def call_it(self):
        return self.__method()


print(MangledMethod().__method())   # AttributeError: "'MangledMethod' object has no attribute '__method'"
print(MangledMethod().call_it())    # 42

Here is a surprising degenerate case, the mangled name will refer to the global:

_MangledGlobal__mangled = 23

class MangledGlobal:
    def test(self):
        return __mangled


print(MangledGlobal().test())       # 23

This is possible because name mangling isn’t tied to class attributes, just to a class context.

Double Leading and Trailing Underscore

If a name starts and ends with double underscores name mangling is not applied. This makes sense when their primary use is for the default dunder methods all classes have, which subclasses should be able to override. You should not create your own names with both sets of underscores to prevent naming conflicts.

References