Over a million developers have joined DZone.

A Quick Primer on Python Concurrency

DZone's Guide to

A Quick Primer on Python Concurrency

The threading module in Python that allows us to spin up native operating system threads to execute multiple tasks concurrently.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Python is often thought of as a single-threaded language, but there are several avenues for executing tasks concurrently.

The threading module allows us to spin up native operating system threads to execute multiple tasks concurrently. The threading API has methods for creating thread objects and then using the object to start and join on the underlying thread.

# define the function to execute in a thread
def do_some_work(val):
    print ("doing some work in thread") 
    print ("echo: {}".format(val))

val = "text" 
# create thread object passing in the target function and optional args in constructor
t = threading.Thread(target=do_some_work, args=(val,)) 
# start the thread
# block execution of the main thread until thread t is completed

The threading module also provides several synchronization and inter-thread communication mechanisms for when threads need to communicate and coordinate with each other, or for when multiple threads are mutating the same area of memory. Locks and queues are the most common of those synchronization methods but Python also provides RLocks, semaphores, conditions, events, and barrier implementations in the threading API.

lock = threading.Lock()
### assume that code below runs in multiple threads ###
lock.acquire() # acquire the lock preventing other threads from doing so
     # access shared resource
        lock.release() # release the lock so that other blocked threads can now run
queue = Queue()
## assume the code below runs in a separate thread t1 ###
def producer(queue):
        item = make_an_item()
## assume the code below runs in a separate thread t2 ###
def consumer(queue):
        item = queue.get() #gets item put in the queue by another thread. Blocks if item not there yet
        queue.task_done() # marks the last item retrieved as done

However, the current implementation of Python has a global interpreter lock (GIL) to make Python easier to implement and faster to run for single threaded programs. But as a result of the GIL, which only allows one thread to run at a time, threading is not suitable for CPU-bound tasks (tasks in which most of the time is spent performing a computation instead of waiting on IO). So instead, we have the multiprocessing package. The multiprocessing package uses processes instead of threads as the actors of parallel execution. And the multiprocessing API tries to mimic the threading API as much as possible, to reduce the amount of dissonance between the two and to make switching easier.

# define the function to execute in a new process
def generate_hash(text):
      return hashlib.sha384(text).hexdigest()
text = "some long text here…"
if __name__ == '__main__':
    # create process object passing in the target function and optional args in constructor
    p = multiprocessing.Process(target= generate_hash,args=(text,))
       # start and join the process

One of the major areas where there is a difference between the threading and multiprocessing APIs is in the implementation of shared state. Threads automatically share memory with each other, but processes don't. So, special accommodations must be made to allow processes to communicate and share state. Processes can either allocate and use OS-shared memory areas or they can communicate with a server process which maintains shared data.

The concurrent.futures module provides a layer of abstraction over both concurrency mechanisms (threads and processes).

It was also the introduction of Futures into Python. In Python, a future represents a pending result and it also allows us to manage the execution of the computation that produces the result. Future API methods include result()cancel(), and add_done_callback(fn):

# define the function to execute in a new process
def generate_hash(text):
      return hashlib.sha384(text).hexdigest()
text = b"some long text here..."
executor = ProcessPoolExecutor() # can be replaced with `ThreadpoolExecutor()`
future_result = executor.submit(generate_hash, text) # submit a job to the pool, immediately returns a future object

Finally, the most recent addition to the Python concurrency family is the asyncio module. asyncio brings single-threaded asynchronous programming to Python. It provides an event loop that runs specialized functions called coroutines. A coroutine has the ability to pause itself and yield control back to the event loop when it needs to wait for IO or some other long-running task. The event loop can then go on and execute other coroutines and resume the prior coroutine when an event occurs that indicates that the IO or long-running task is complete. As a result, we have multiple tasks running on the same thread and yielding to one another instead of blocking.

# a coroutine function as denoted by the async keyword
async def delayed_hello():
        print("Hello ")
       # the coroutine will pause here and yield back to the event loop
    await asyncio.sleep(1)
# get the event loop
loop = asyncio.get_event_loop()
# pass the coroutine to the event loop for execution

There are several resources that provided an in-depth look into Python concurrency, like the Python Module of the Week blog and the Python Parallel Programming Cookbook. If you are a Pluralsight user, you can also check out my Pluralsight course, Python Concurrency: Getting Started.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

big data ,python ,threading ,concurrency ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}