Python Multi-processing with example

Varun Tomar
4 min readFeb 5, 2022

--

There are really interesting posts of multithreading available online. When I started implementing it was hard to find a simple example of how to do it.

In Python, the multiprocessing module includes a simple and intuitive API for dividing work between multiple processes. Multiprocessing package supports spawning processes using an API similar to the threading module. It offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the users to full advantage of multiple processors on a machine.

Using multiprocessing module

There are multiple classes available in the multiprocessingmodule. If you are trying to start with multi-processing I would recommend using thePool class. The Pool class can handle an enormous number of processes. It can run multiple jobs per process as it has the ability to queue the jobs.

There are 6 methods that are available with Pool class:

             | Multi-args  Concurrence   Blocking    Ordered-results
--------------------------------------------------------------------
map | no yes yes yes
map_async | no yes no yes
apply | yes no yes no
apply_async | yes yes no no
starmap | yes yes yes yes
starmap_async| yes yes no no

The apply and map methods are basically equivalents to Python’s in-built apply and map functions. Depending on the use case you can use what fit the requirement. I am using apply_async for my use case.

Multi-processing with apply_async(variant of apply)

  • It assigns a job to the worker pool but doesn't wait around for the result. Instead, it returns a placeholder. We can use the placeholder to get the actual result by calling result_placeholder.get().
  • Takes a function as a first argument and a tuple of arguments for that function as a second argument.

Multiprocessing with apply_async with return:

If you run the above code:

IT WASN'T JUST A PUPPY
IT WASN'T JUST A PUPPY
IT WASN'T JUST A PUPPY
YOU WANTED ME BACK...
YOU WANTED ME BACK...
YOU WANTED ME BACK...
['Hello: john_wick_1', 'Hello: john_wick_1', 'Hello: john_wick_3']
Completed in 1.175609827041626 seconds

As you can see all three entries are processed in parallel and completed in 1.175 seconds.

Multiprocessing with apply_async with return using callback:

If you run the above code:

IT WASN'T JUST A PUPPY
IT WASN'T JUST A PUPPY
IT WASN'T JUST A PUPPY
YOU WANTED ME BACK...
YOU WANTED ME BACK...
YOU WANTED ME BACK...
['Hello: john_wick_1', 'Hello: john_wick_1', 'Hello: john_wick_3']
Completed in 1.1662490367889404 seconds

os.cpu_count(): Use all the available CPUs in the system. You can also specify any number of CPUs.

>>> import os
>>> print(f"Number of CPU available: {os.cpu_count()}")
Number of CPU available: 16

pool.close(): Close the Process object, releasing all resources associated with it. ValueError is raised if the underlying process is still running. Once close() returns successfully, most other methods and attributes of the Process object will raise ValueError.

pool.join(): It will block until all items in the queue have been gotten and processed. The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer calls task_done() to indicate that the item was retrieved and all work on it is complete. When the count of unfinished tasks drops to zero, join() unblocks.

What is callback used for? Parameters to hello_world are passed using the args argument of apply_async and the callback function is where the result of hello_world is sent.

Multi-processing vs multi-threading

Multithreading: If your program is network bound.

Multiprocessing: If your program is CPU bound.

Note: There are other differences but for this post, I am going to focus on the above two differences. You can see read the difference here.

Difference between multithreading and multiprocessing?

With threading, concurrency is achieved using multiple threads, but due to the GIL only one thread can be running at a time. In multiprocessing, the original process is forked process into multiple child processes bypassing the GIL. Each child process will have a copy of the entire program’s memory.

Conclusion

In this tutorial, I tried to show how I implemented multiprocessing using apply_async available in Python. It is very well evident that by deploying a suitable method from the multiprocessing library, we can get a significant reduction in execution time. It reduced the execution time of my code from over 20 minutes to under 5minutes.

References

--

--

Varun Tomar
Varun Tomar

No responses yet