Python 101 – Creating Multiple Processes
In this article, you will learn about the pros and cons of using processes, creating processes with multiprocessing, the subclassing process, and more.
Join the DZone community and get the full member experience.
Join For FreeMost CPU manufacturers are creating multi-core CPUs now. Even cell phones come with multiple cores! Python threads can’t use those cores because of the Global Interpreter Lock. Starting in Python 2.6, the multiprocessing
module was added which lets you take full advantage of all the cores on your machine.
In this article, you will learn about the following topics:
- Pros of Using Processes
- Cons of Using Processes
- Creating Processes with
multiprocessing
- Subclassing
Process
- Creating a Process Pool
This article is not a comprehensive overview of multiprocessing. The topic of multiprocessing and concurrency in general would be better suited in a book of its own. You can always check out the documentation for the multiprocessing
module if you need to here:
Now, let’s get started!
Pros of Using Processes
There are several pros to using processes:
- Processes use separate memory space
- Code can be more straight forward compared to threads
- Uses multiple CPUs / cores
- Avoids the Global Interpreter Lock (GIL)
- Child processes can be killed (unlike threads)
- The
multiprocessing
module has an interface similar tothreading.Thread
- Good for CPU-bound processing (encryption, binary search, matrix multiplication)
Now let’s look at some of the cons of processes!
Cons of Using Processes
There are also a couple of cons to using processes:
- Interprocess communication is more complicated
- Memory footprint is larger than threads
Now let’s learn how to create a process with Python!
Creating Processes With multiprocessing
The multiprocessing
module was designed to mimic how the threading.Thread
class worked.
Here is an example of using the multiprocessing
module:
import multiprocessing
import random
import time
def worker(name: str) -> None:
print(f'Started worker {name}')
worker_time = random.choice(range(1, 5))
time.sleep(worker_time)
print(f'{name} worker finished in {worker_time} seconds')
if __name__ == '__main__':
processes = []
for i in range(5):
process = multiprocessing.Process(target=worker,
args=(f'computer_{i}',))
processes.append(process)
process.start()
for proc in processes:
proc.join()
The first step is to import the multiprocessing
module. The other two imports are for the random
and time
modules respectively.
Then you have the silly worker()
function that pretends to do some work. It takes in a name
and returns nothing. Inside the worker()
function, it will print out the name
of the worker, then it will use time.sleep()
to simulate doing some long-running process. Finally, it will print out that it has finished.
The last part of the code snippet is where you create 5 worker processes. You use multiprocessing.Process()
, which works pretty much the same way as threading.Thread()
did. You tell Process
what target function to use and what arguments to pass to it. The main difference is that this time you are creating a list
of processes. For each process, you call its start()
method to start the process.
Then at the end, you loop over the list of processes and call its join()
method, which tells Python to wait for the process to terminate.
When you run this code, you will see output that is similar to the following:
xxxxxxxxxx
Started worker computer_2
computer_2 worker finished in 2 seconds
Started worker computer_1
computer_1 worker finished in 3 seconds
Started worker computer_3
computer_3 worker finished in 3 seconds
Started worker computer_0
computer_0 worker finished in 4 seconds
Started worker computer_4
computer_4 worker finished in 4 seconds
Each time you run your script, the output will be a little different because of the random
module. Give it a try and see for yourself!
Subclassing Process
The Process
class from the multiprocessing
module can also be subclassed. It works in much the same way as the threading.Thread
class does.
Let’s take a look:
xxxxxxxxxx
# worker_thread_subclass.py
import random
import multiprocessing
import time
class WorkerProcess(multiprocessing.Process):
def __init__(self, name):
multiprocessing.Process.__init__(self)
self.name = name
def run(self):
"""
Run the thread
"""
worker(self.name)
def worker(name: str) -> None:
print(f'Started worker {name}')
worker_time = random.choice(range(1, 5))
time.sleep(worker_time)
print(f'{name} worker finished in {worker_time} seconds')
if __name__ == '__main__':
processes = []
for i in range(5):
process = WorkerProcess(name=f'computer_{i}')
processes.append(process)
process.start()
for process in processes:
process.join()
Here you subclassmultiprocess.Process()
and override itsrun()
method.
Next, you create the processes in a loop at the end of the code and add it to a process list. Then to get the processes to work properly, you need to loop over the list of processes and call join()
on each of them. This works exactly as it did in the previous process example from the last section.
The output from this class should also be quite similar to the output from the previous section.
Creating a Process Pool
If you have a lot of processes to run, sometime you will want to limit the number of processes that can run at once. For example, let’s say you need to run 20 processes but you have a processor with only 4 cores. You can use the multiprocessing
module to create a process pool that will limit the number of processes running to only 4 at a time.
Here’s how you can do it:
xxxxxxxxxx
import random
import time
from multiprocessing import Pool
def worker(name: str) -> None:
print(f'Started worker {name}')
worker_time = random.choice(range(1, 5))
time.sleep(worker_time)
print(f'{name} worker finished in {worker_time} seconds')
if __name__ == '__main__':
process_names = [f'computer_{i}' for i in range(15)]
pool = Pool(processes=5)
pool.map(worker, process_names)
pool.terminate()
In this example, you have the same worker()
function. The real meat of the code is at the end where you create 15 process names using a list comprehension. Then you create a Pool
and set the total number of processes to run at once to 5. To use the pool
, you need to call the map()
method and pass it the function you wish to call along with the arguments to pass to the function.
Python will now run 5 processes (or less) at a time until all the processes have finished. You need to call terminate()
on the pool at the end or you will see a message like this:
xxxxxxxxxx
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/resource_tracker.py:216:
UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown
Now you know how to create a process Pool
with Python!
Wrapping Up
You have now learned the basics of using the multiprocessing
module. You have learned the following:
- Pros of Using Processes
- Cons of Using Processes
- Creating Processes with
multiprocessing
- Subclassing
Process
- Creating a Process Pool
There is much more to multiprocessing
than what is covered here. You could learn how to use Python’s Queue
module to get output from processes. There is the topic of interprocess communication. And there’s much more too. However the objective was to learn how to create processes, not learn every nuance of the multiprocessing
module. Concurrency is a large topic that would need much more in-depth coverage than what can be covered in this article.
Opinions expressed by DZone contributors are their own.
Comments