Java > Java 8 Features > Streams API > Parallel Streams
Parallel Stream Example: Filtering and Collecting Data
This snippet demonstrates filtering and collecting data using a parallel stream. It creates a list of strings, filters the strings that start with a specific prefix, and collects the filtered strings into a new list using a parallel stream for improved performance.
Code Example
This code creates a list of strings. It then filters the strings that start with 'prefix_' and collects them into a new list. This is done using both sequential and parallel streams, and the execution time of each is printed to the console. The `parallelStream()` method converts the collection into a stream that can be processed in parallel, allowing multiple threads to filter and collect the data concurrently.
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class ParallelStreamFilterCollect {
public static void main(String[] args) {
// Create a list of strings
List<String> strings = new ArrayList<>();
for (int i = 0; i < 1000; i++) {
strings.add("prefix_" + i);
strings.add("other_" + i);
}
// Sequential Stream
long startTimeSequential = System.nanoTime();
List<String> filteredStringsSequential = strings.stream()
.filter(s -> s.startsWith("prefix_"))
.collect(Collectors.toList());
long endTimeSequential = System.nanoTime();
long durationSequential = (endTimeSequential - startTimeSequential) / 1_000_000;
System.out.println("Sequential Filtered String Count: " + filteredStringsSequential.size());
System.out.println("Sequential Time: " + durationSequential + " ms");
// Parallel Stream
long startTimeParallel = System.nanoTime();
List<String> filteredStringsParallel = strings.parallelStream()
.filter(s -> s.startsWith("prefix_"))
.collect(Collectors.toList());
long endTimeParallel = System.nanoTime();
long durationParallel = (endTimeParallel - startTimeParallel) / 1_000_000;
System.out.println("Parallel Filtered String Count: " + filteredStringsParallel.size());
System.out.println("Parallel Time: " + durationParallel + " ms");
}
}
Concepts Behind the Snippet
Parallel Streams: As in the previous example, leverages multiple cores for parallel processing of the stream.
Filtering: The `filter()` operation selectively includes elements based on a predicate (a boolean-valued function).
Collecting: The `collect()` operation accumulates the results of the stream processing into a new collection (in this case, a `List`). The `Collectors.toList()` method provides a convenient way to collect the elements into a list.
Real-Life Use Case
This pattern is useful when you need to extract specific elements from a large dataset based on certain criteria, such as:
Best Practices
Interview Tip
Be prepared to discuss the performance implications of using parallel streams for filtering and collecting data. Explain how the size of the dataset and the complexity of the filtering logic can impact the efficiency of parallelization.
When to use them
Use Parallel Streams for filtering and collecting when:
Memory Footprint
The memory footprint of this operation will primarily depend on the size of the original list and the size of the resulting filtered list. Using a parallel stream can also temporarily increase memory usage due to the creation of sub-streams and intermediate data structures.
Alternatives
Alternatives include:
Pros
Cons
FAQ
-
Why does `collect(Collectors.toList())` not need to be synchronized in this parallel example?
`Collectors.toList()` in the context of `parallelStream()` uses an internal concurrent data structure to accumulate the results from multiple threads. While the underlying data structure *is* being modified concurrently, the `Collectors.toList()` implementation handles the synchronization internally, so you don't need to add explicit synchronization code. -
Can I use a different collector with parallel streams?
Yes, you can use a variety of collectors with parallel streams, such as `Collectors.toSet()`, `Collectors.toMap()`, and `Collectors.groupingBy()`. Just ensure that the collector is thread-safe and can handle concurrent modifications. -
What if my filtering operation is very computationally expensive?
In this case, parallel streams can provide significant performance benefits. However, you should still measure the performance of both sequential and parallel streams to ensure that parallelization is actually improving execution time. Also, consider if the operation can be optimized separately.