Python > Modules and Packages > Standard Library > Concurrency and Parallelism (`threading`, `multiprocessing` modules)
Basic Multiprocessing Example
This snippet showcases the multiprocessing
module for parallel execution. It spawns two processes that each execute the square_numbers
function, calculating the squares of numbers from 1 to 5. Unlike threading, multiprocessing
bypasses the GIL, enabling true parallel processing on multi-core systems.
Code
The multiprocessing
module creates separate processes, each with its own Python interpreter. The square_numbers
function calculates the square of each number. We create two processes and start them. process.join()
ensures the main program waits for the child processes to complete. This approach leverages multiple CPU cores for true parallel execution, which is beneficial for CPU-bound tasks.
import multiprocessing
import time
def square_numbers(process_id):
for i in range(1, 6):
time.sleep(0.1) # Simulate some work
print(f"Process {process_id}: {i} squared = {i*i}")
if __name__ == "__main__":
# Create two processes
process1 = multiprocessing.Process(target=square_numbers, args=(1,))
process2 = multiprocessing.Process(target=square_numbers, args=(2,))
# Start the processes
process1.start()
process2.start()
# Wait for the processes to finish
process1.join()
process2.join()
print("All processes finished.")
Concepts Behind the Snippet
multiprocessing
module enables true parallelism in Python by creating separate processes for each task.multiprocessing.Process
: This class is used to create new processes. You specify the target function and any arguments to pass to that function.process.start()
: This method starts the process's execution.process.join()
: This method blocks the calling process (in this case, the main process) until the process whose join()
method is called completes its execution.
Real-Life Use Case
Consider a scenario where you need to perform a computationally intensive task, such as image processing or scientific simulations. You can divide the task into smaller subtasks and assign each subtask to a separate process. This allows you to utilize all available CPU cores and significantly reduce the overall execution time. Machine Learning model training often uses multiprocessing
to speed up data preprocessing.
Best Practices
multiprocessing
avoids direct memory sharing, be careful about external shared resources like files or databases. Coordinate access to these resources to prevent corruption.multiprocessing.Queue
class. It's a thread-safe and process-safe queue that allows you to pass data between processes.multiprocessing
for very short or simple tasks, as the overhead might outweigh the benefits of parallelism.multiprocessing.Pool
class. It provides a convenient way to distribute tasks among a pool of worker processes.
Interview Tip
Understand the difference between threading
and multiprocessing
, and when to use each. Explain how multiprocessing
overcomes the GIL limitation. Discuss common IPC mechanisms and the challenges of managing communication between processes.
When to Use Multiprocessing
Multiprocessing is ideal for CPU-bound tasks that can be divided into independent subtasks. Examples include numerical computations, image processing, video encoding, and parallel data analysis. If your code is primarily waiting for I/O, then threading or asyncio might be more appropriate.
Memory Footprint
Processes have a larger memory footprint compared to threads because each process has its own memory space. This can be a concern if you are creating a large number of processes or dealing with large datasets.
Alternatives
threading
: Suitable for I/O-bound tasks where the GIL is less of a bottleneck.asyncio
: For asynchronous programming and event-driven concurrency, especially useful for handling many concurrent I/O operations efficiently.
Pros
Cons
FAQ
-
How does
multiprocessing
overcome the GIL limitation?
Each process created bymultiprocessing
has its own Python interpreter and its own memory space. This means that the GIL in one process does not affect other processes. Each process can execute Python bytecode in parallel on different CPU cores. -
How can I share data between processes?
You can share data between processes using IPC mechanisms such as queues (multiprocessing.Queue
), pipes (multiprocessing.Pipe
), or shared memory (multiprocessing.Value
,multiprocessing.Array
). Queues are the most common and convenient way to pass data between processes. -
What is a
multiprocessing.Pool
, and when should I use it?
Amultiprocessing.Pool
is a collection of worker processes that can be used to distribute tasks among them. You should use aPool
when you have a large number of tasks to perform and want to manage the processes efficiently. ThePool
class handles the creation, management, and distribution of tasks among the worker processes.