def double(x):
return 2*x
# also assign the function to the `twotimes` variable
= double
twotimes type(twotimes)
function
The goal of this module is to gain some useful approaches to software design and modularity that will help with building scalable, portable, and reusable code. We will cover several key aspects of software design for concurrency:
DRY: Don’t Repeat Yourself
By creating small functions that handle only one logical task and do it well, we quickly gain:
When writing functions that are to be used in concurrent programming, it is best to keep them short and focused on a single, well-defined task. This enables you test the function thoroughly, and reuse it in multiple contexts. It also enables you to more easily debug what is happening in the function when trying to understands parallel execution.
Functions are first-class objects in Python (and many other languages). This has some real benefits for and implications for parallel programming. Because a function is an object, it means that it can be 1) stored in a variable, and 2) passed as an argument to another function. We saw that in the module on pleasingly parallel codes when we used ThreadPoolExecutor.map()
, which takes a function and an iterable object as arguments. Let’s check out how you can use and manipulate functions as objects. First, let’s define a simple function, assign it to another variable, and then use both:
def double(x):
return 2*x
# also assign the function to the `twotimes` variable
= double
twotimes type(twotimes)
function
Note that when we print it to screen, we see that prod
is of type function
, and when we use the two instances, we get identical results:
print(double(7))
print(twotimes(7))
print(double(5) == twotimes(5))
14
14
True
This representation of a function as an object comes in handy when we want to invoke a function in multiple different contexts, such as in a parallel execution environment via a map()
function.
list(map(twotimes, [2,3,4]))
[4, 6, 8]
This works because the function twotimes
can be passed to map
and executed from within map
. When you execute a function that is passed in via an argument, it is called function composition. We can easily illustrate this by creating some function and passing it to a wrapper function to be executed:
def some_function():
print("Ran some_function")
def wrapper(func_to_run):
print("Ran wrapper")
func_to_run()print("Finished wrapper")
wrapper(some_function)
Ran wrapper
Ran some_function
Finished wrapper
When executing a function, the variables that are in scope to that function are local, meaning that the presence of another variable with the same name in another scope will not affect a calculation. For example:
def do_task():
= 10
x
= 5
x
do_task()print(x)
5
However, if that same variable is declared as global inside the function, then the assignment will have global impact on the value of the parent scope:
def do_task():
global x
= 10
x
= 5
x
do_task()print(x)
10
So, you can see that writing a function that uses global variables can have effects outside of the scope of the function call. This can have drastic consequences on concurrent code, as the order in which function calls are made when operating concurrently are not deterministic, and so the impact of global variables will also not be deterministic.
A related issue arises when code in a function depends on its enclosing namespace, such as when a function is defined inside of another function. When resolving a variable, python first looks in the Local namespace, and then in the Enclosing namespace, Global namespace, and Built-in namespace. So, even if a variable is not defined locally, it might still be resolved by one of the other namespaces in surprising ways.
= 3
a def do_stuff(b):
return a*b
6) do_stuff(
18
A pure function is a function that depends only on its input arguments, and it has no side effects. In other words, a pure function returns the same value if called repeatedly with the same arguments. Pure functions are particularly amenable to concurrency. For example, the double(x)
function above is a pure function, because in all cases calling double(2)
will always return 4
.
In contrast, a non-pure function is a function in which the return value may change if the function is called repeatedly, typically because it depends on some particular state that affects the outcome but is not part of the input arguments. For example, the time.time()
function returns different values based on the current state of the system clock.
Using a global variable in a function creates a side-effect that makes it an impure function, as would other operations that modify an external state variable.
Task dependencies occur when one task in the code depends on the results of another task or computation in the code.
Locks are a mechanism to manage access to a resource so that multiple threads can access the resource. By adding locks to an otherwise parallel process, we introduce a degree of serial execution to the locked portion of the process. Basically, each thread can only access the resource when it has the lock, and only one lock is given out at a time. Take this example of what happens without locking:
from concurrent.futures import ProcessPoolExecutor
import time
def hello(i):
print(i, 'Hello')
print(i, 'world')
= ProcessPoolExecutor()
executor = [executor.submit(hello, i) for i in range(3)]
futures for future in futures:
future.result()
1
2
0
Hello
Hello
Hello
2
0
1
world
world
world
You can see that the results come back in a semi-random order, and the call to sleep
creates a delay between printing the two words, which means that the three messages get jumbled when printed. To fix this, we can introduce a lock from the multiprocessing
package.
from concurrent.futures import ProcessPoolExecutor
import time
import multiprocessing
def hello(i, lock):
with lock:
print(i, 'Hello')
print(i, 'world')
= multiprocessing.Manager().Lock()
lock = ProcessPoolExecutor()
executor = [executor.submit(hello, i, lock) for i in range(3)]
futures for future in futures:
future.result()
0
Hello
0
world
1
Hello
1
world
2
Hello
2
world
The lock is generated and then passed to each invocation of the hello function. Using with
triggers the use of the context manager for the lock, which allows the manager to synchronize use of the lock. This ensures that only one process can be printing at the same time, ensuring that the outputs are properly ordered.
Race conditions occur when two tasks execute in parallel, but produce different results based on which task finishes first. Ensuring that results are correct under different timing situations requires careful testing.
Deadlocks occur when two concurrent tasks block on the output of the other. Deadlocks cause parallel programs to lock up indefinitely, can be difficult to track down, and will often require the program to be killed.