Java tutorials > Core Java Fundamentals > Data Structures and Collections > What are the implementations of `Set` (HashSet, TreeSet, LinkedHashSet)?
What are the implementations of `Set` (HashSet, TreeSet, LinkedHashSet)?
This tutorial explains the different implementations of the `Set` interface in Java: `HashSet`, `TreeSet`, and `LinkedHashSet`. It covers their unique characteristics, performance implications, and appropriate use cases. Understanding these implementations is crucial for efficient data management and algorithm design.
Introduction to the Set Interface
The `Set` interface in Java's Collections Framework represents a collection that contains no duplicate elements. More formally, sets contain no pair of elements `e1` and `e2` such that `e1.equals(e2)`, and at most one null element. The `Set` interface models the mathematical set abstraction. The Java Collections Framework provides several implementations of the `Set` interface, each with different performance characteristics and use cases.
HashSet: Unordered and Fast
`HashSet` is a class that implements the `Set` interface, backed by a hash table (actually a `HashMap` instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This implementation offers constant time performance for the basic operations (`add`, `remove`, `contains`, and `size`), assuming the hash function disperses the elements properly among the buckets. Iterating over a `HashSet` takes time proportional to the sum of the number of entries in the `HashSet` instance (the size) plus the number of “buckets” in the backing `HashMap` instance (the capacity).
import java.util.HashSet;
import java.util.Set;
public class HashSetExample {
public static void main(String[] args) {
Set<String> hashSet = new HashSet<>();
hashSet.add("Apple");
hashSet.add("Banana");
hashSet.add("Orange");
hashSet.add("Apple"); // Duplicate element, will be ignored
System.out.println("HashSet: " + hashSet); // Output: HashSet: [Orange, Banana, Apple] (Order not guaranteed)
boolean containsBanana = hashSet.contains("Banana");
System.out.println("Contains Banana: " + containsBanana); // Output: Contains Banana: true
hashSet.remove("Banana");
System.out.println("HashSet after removing Banana: " + hashSet); // Output: HashSet after removing Banana: [Orange, Apple]
}
}
TreeSet: Sorted Set
`TreeSet` is an implementation of the `Set` interface that uses a `TreeMap` for storage. The elements are stored in a sorted order, either according to their natural ordering (if the elements implement the `Comparable` interface) or according to a `Comparator` provided at the time of `TreeSet` creation. This makes `TreeSet` suitable for applications where elements need to be accessed in a sorted order. The `add`, `remove`, and `contains` operations offer `O(log n)` time complexity.
import java.util.TreeSet;
import java.util.Set;
public class TreeSetExample {
public static void main(String[] args) {
Set<String> treeSet = new TreeSet<>();
treeSet.add("Apple");
treeSet.add("Banana");
treeSet.add("Orange");
treeSet.add("Apple"); // Duplicate element, will be ignored
System.out.println("TreeSet: " + treeSet); // Output: TreeSet: [Apple, Banana, Orange] (Elements are sorted)
boolean containsBanana = treeSet.contains("Banana");
System.out.println("Contains Banana: " + containsBanana); // Output: Contains Banana: true
treeSet.remove("Banana");
System.out.println("TreeSet after removing Banana: " + treeSet); // Output: TreeSet after removing Banana: [Apple, Orange]
}
}
LinkedHashSet: Insertion Order
`LinkedHashSet` is a `HashSet` that also maintains a doubly-linked list running through all of its entries. This linked list defines the insertion order, which is the order in which elements were inserted into the set. The `LinkedHashSet` implementation differs from `HashSet` in that it maintains a predictable iteration order, which is the order in which elements were inserted into the set. This is particularly useful when you want to iterate through the elements in the order they were added. The `add`, `remove`, and `contains` methods have constant-time performance on average. Iterating over a `LinkedHashSet` requires time proportional to the size of the set, regardless of the capacity of the underlying hash table.
import java.util.LinkedHashSet;
import java.util.Set;
public class LinkedHashSetExample {
public static void main(String[] args) {
Set<String> linkedHashSet = new LinkedHashSet<>();
linkedHashSet.add("Apple");
linkedHashSet.add("Banana");
linkedHashSet.add("Orange");
linkedHashSet.add("Apple"); // Duplicate element, will be ignored
System.out.println("LinkedHashSet: " + linkedHashSet); // Output: LinkedHashSet: [Apple, Banana, Orange] (Insertion order maintained)
boolean containsBanana = linkedHashSet.contains("Banana");
System.out.println("Contains Banana: " + containsBanana); // Output: Contains Banana: true
linkedHashSet.remove("Banana");
System.out.println("LinkedHashSet after removing Banana: " + linkedHashSet); // Output: LinkedHashSet after removing Banana: [Apple, Orange]
}
}
Concepts Behind the Snippets
All three implementations ensure uniqueness of elements as enforced by the `Set` interface. `HashSet` uses hashing for fast lookups, but provides no ordering guarantee. `TreeSet` uses a tree structure for sorted storage. `LinkedHashSet` uses a hash table with a linked list to maintain insertion order. The choice depends on the specific needs of the application, particularly the importance of order and speed of operations.
Real-Life Use Case Section
Consider an e-commerce application. `HashSet` could be used to store unique product IDs in a user's shopping cart. `TreeSet` could be used to display a list of products sorted alphabetically by name. `LinkedHashSet` could be used to maintain the order in which products were added to a wishlist, useful for displaying 'recently added' items.
Best Practices
Interview Tip
Be prepared to discuss the differences between `HashSet`, `TreeSet`, and `LinkedHashSet` in terms of their ordering, performance, and underlying data structures. Understand the trade-offs involved in choosing one implementation over another. Explain how these choices impact overall application performance. Be ready to provide use case examples for each.
When to use them
Memory footprint
Alternatives
If you need a sorted set and the performance of `TreeSet` is not sufficient, consider using a sorted list (e.g., `ArrayList`) and maintaining its sorted order manually, although this requires more manual coding. For very large datasets, consider using external libraries that provide specialized sorted set implementations with better performance.
Pros and Cons - HashSet
Pros:
Cons:
Pros and Cons - TreeSet
Pros:
Cons:
Pros and Cons - LinkedHashSet
Pros:
Cons:
FAQ
-
What happens if I add a duplicate element to a Set?
The `Set` interface does not allow duplicate elements. If you attempt to add a duplicate element, the `add()` method will return `false`, and the set will remain unchanged. The duplicate element will not be added. -
Which Set implementation should I use if I need to iterate through the elements in a specific order?
If you need to iterate through the elements in the order they were inserted, use `LinkedHashSet`. If you need to iterate through the elements in sorted order, use `TreeSet`. `HashSet` does not guarantee any specific iteration order. -
Are the Set implementations thread-safe?
No, none of the standard `Set` implementations (`HashSet`, `TreeSet`, `LinkedHashSet`) are inherently thread-safe. If you need to use a set in a multithreaded environment, you should synchronize access to the set using external synchronization mechanisms (e.g., `Collections.synchronizedSet()`) or use a thread-safe concurrent set implementation (e.g., `ConcurrentSkipListSet` from the `java.util.concurrent` package).