Java > Java 8 Features > Streams API > Parallel Streams

Parallel Stream Example: Summing a List of Numbers

This code snippet demonstrates how to use parallel streams to efficiently sum a large list of numbers. Parallel streams can significantly reduce execution time for computationally intensive operations by distributing the workload across multiple CPU cores.

Code Example

This code creates a list of 10 million numbers and then calculates the sum of these numbers using both a sequential stream and a parallel stream. The execution time of each approach is measured and printed to the console. Notice the use of `numbers.parallelStream()` to create a parallel stream. Also demonstrates `LongStream.rangeClosed` for an alternative method of generating the numbers in parallel.

import java.util.ArrayList;
import java.util.List;
import java.util.stream.LongStream;

public class ParallelStreamSum {

    public static void main(String[] args) {
        // Create a large list of numbers
        List<Long> numbers = new ArrayList<>();
        for (long i = 1; i <= 10_000_000; i++) {
            numbers.add(i);
        }

        // Sequential Stream
        long startTimeSequential = System.nanoTime();
        long sumSequential = numbers.stream().mapToLong(Long::longValue).sum();
        long endTimeSequential = System.nanoTime();
        long durationSequential = (endTimeSequential - startTimeSequential) / 1_000_000;

        System.out.println("Sequential Sum: " + sumSequential);
        System.out.println("Sequential Time: " + durationSequential + " ms");

        // Parallel Stream
        long startTimeParallel = System.nanoTime();
        long sumParallel = numbers.parallelStream().mapToLong(Long::longValue).sum();
        long endTimeParallel = System.nanoTime();
        long durationParallel = (endTimeParallel - startTimeParallel) / 1_000_000;

        System.out.println("Parallel Sum: " + sumParallel);
        System.out.println("Parallel Time: " + durationParallel + " ms");

        //Alternative: Using LongStream.rangeClosed for generating numbers
        long startTimeParallelLongStream = System.nanoTime();
        long sumParallelLongStream = LongStream.rangeClosed(1, 10_000_000).parallel().sum();
        long endTimeParallelLongStream = System.nanoTime();
        long durationParallelLongStream = (endTimeParallelLongStream - startTimeParallelLongStream) / 1_000_000;

        System.out.println("Parallel LongStream Sum: " + sumParallelLongStream);
        System.out.println("Parallel LongStream Time: " + durationParallelLongStream + " ms");

    }
}

Concepts Behind the Snippet

Parallel Streams: Java's parallel streams leverage the Fork/Join framework to divide a stream into smaller sub-streams that can be processed concurrently across multiple threads. This can lead to significant performance gains for CPU-bound operations on large datasets.

Fork/Join Framework: This framework is designed for parallel, recursive task decomposition. It divides a large task into smaller, independent subtasks, executes them in parallel, and then combines the results.

Stream API: Provides a functional programming approach to processing sequences of elements. Parallel streams are an extension of the Stream API designed for parallel execution.

Real-Life Use Case

Parallel streams are particularly useful in scenarios involving:

  • Data Analysis: Processing large datasets for calculations, aggregations, and transformations.
  • Image Processing: Applying filters or performing complex calculations on large images.
  • Scientific Computing: Simulating complex systems or performing computationally intensive calculations.
  • Batch Processing: Performing operations on large batches of data in parallel.

Best Practices

  • Measure Performance: Always measure the performance of both sequential and parallel streams to determine if parallelization actually improves execution time. Overhead associated with parallelization can sometimes outweigh the benefits.
  • Be Aware of Shared Mutable State: Avoid using shared mutable state within stream operations. Parallel streams can lead to race conditions if multiple threads are modifying the same data concurrently. Use thread-safe data structures if shared mutable state is unavoidable.
  • Splitting Data: The efficiency of parallel streams depends on how well the data can be split into independent chunks. Data structures that are easily splittable (e.g., ArrayList) tend to perform better with parallel streams.
  • CPU-Bound vs. I/O-Bound: Parallel streams are most effective for CPU-bound operations. For I/O-bound operations, asynchronous programming techniques may be more appropriate.

Interview Tip

When discussing parallel streams in an interview, be sure to mention the Fork/Join framework, potential performance benefits, and the importance of avoiding shared mutable state. Also, be prepared to discuss the trade-offs between sequential and parallel processing.

When to use them

Use Parallel Streams when:

  • You have a large dataset.
  • Your operation is CPU-bound.
  • You can avoid shared mutable state.
  • You have multiple CPU cores available.
  • You've measured and confirmed a performance improvement.

Memory Footprint

Parallel streams can increase the memory footprint of your application because they require creating multiple threads and potentially copying data into sub-streams. Be mindful of memory usage, especially when dealing with extremely large datasets.

Alternatives

Alternatives to parallel streams include:

  • ExecutorService: For more fine-grained control over thread management.
  • CompletableFuture: For asynchronous programming and composing asynchronous operations.
  • Reactive Programming (e.g., RxJava, Project Reactor): For handling asynchronous data streams with backpressure.

Pros

  • Potential for significant performance improvement.
  • Simplified syntax for parallel processing.
  • Automatic management of threads and task distribution.

Cons

  • Overhead associated with thread creation and task management.
  • Potential for race conditions if shared mutable state is used.
  • Not always faster than sequential streams.
  • Increased memory footprint.

FAQ

  • When will a sequential stream be faster than a parallel stream?

    A sequential stream can be faster than a parallel stream when the dataset is small, the operation is I/O-bound, or the overhead of parallelization outweighs the benefits of parallel execution. Also, if there's a lot of thread contention or synchronization overhead, sequential execution might be faster.
  • How do I ensure thread safety when using parallel streams?

    Avoid using shared mutable state within stream operations. If shared mutable state is unavoidable, use thread-safe data structures (e.g., `ConcurrentHashMap`, `AtomicInteger`) and synchronization mechanisms (e.g., locks, semaphores) to protect the data.
  • Can I use parallel streams with any collection?

    Yes, you can create a parallel stream from any `Collection` by calling the `parallelStream()` method. However, the performance benefits of parallel streams will vary depending on the data structure and the operation being performed.