Python tutorials > Data Structures > Sets > What are common set operations?

What are common set operations?

Sets in Python are unordered collections of unique elements. They support a variety of mathematical set operations, making them powerful tools for tasks involving membership testing, removing duplicates, and performing calculations based on set theory. This tutorial will explore common set operations in Python with code examples and explanations.

Basic Set Operations

Before performing set operations, let's define two sets, set1 and set2, which will be used in the following examples. These sets contain integers, but sets can contain elements of different data types (as long as they are hashable, i.e., immutable).

set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}

Union

The union of two sets combines all elements from both sets into a new set, eliminating duplicates. The union can be achieved using the pipe operator (|) or the .union() method. The output will be Union: {1, 2, 3, 4, 5, 6, 7, 8}, showcasing all unique elements present in either set1 or set2.

union_set = set1 | set2  # Using the | operator
# or
union_set = set1.union(set2)
print(f'Union: {union_set}')

Intersection

The intersection of two sets returns a new set containing only the elements that are common to both sets. This is achieved via the ampersand operator (&) or the .intersection() method. The output will be Intersection: {4, 5}, representing the elements found in both set1 and set2.

intersection_set = set1 & set2  # Using the & operator
# or
intersection_set = set1.intersection(set2)
print(f'Intersection: {intersection_set}')

Difference

The difference between two sets (A - B) returns a new set containing the elements that are present in set A but not in set B. This is implemented using the minus operator (-) or the .difference() method. The output will be Difference: {1, 2, 3}, indicating elements present in set1 but not in set2.

difference_set = set1 - set2  # Using the - operator
# or
difference_set = set1.difference(set2)
print(f'Difference: {difference_set}')

Symmetric Difference

The symmetric difference between two sets returns a new set containing elements that are in either set A or set B, but not in both. This is achieved using the caret operator (^) or the .symmetric_difference() method. The output will be Symmetric Difference: {1, 2, 3, 6, 7, 8}, showcasing elements unique to each set.

symmetric_difference_set = set1 ^ set2  # Using the ^ operator
# or
symmetric_difference_set = set1.symmetric_difference(set2)
print(f'Symmetric Difference: {symmetric_difference_set}')

Subset and Superset

issubset() checks if all elements of one set are present in another (i.e., if one set is a subset of another). issuperset() verifies if a set contains all elements of another (i.e., if one set is a superset of another). The output will be: Is set_a a subset of set_b? True and Is set_b a superset of set_a? True.

set_a = {1, 2, 3}
set_b = {1, 2, 3, 4, 5}

print(f'Is set_a a subset of set_b? {set_a.issubset(set_b)}')
print(f'Is set_b a superset of set_a? {set_b.issuperset(set_a)}')

Disjoint Sets

The isdisjoint() method checks if two sets have no elements in common. If they have no common elements, they are considered disjoint. The output will be: Are set_c and set_d disjoint? True and Are set_c and set_e disjoint? False.

set_c = {1, 2, 3}
set_d = {4, 5, 6}
set_e = {3, 7, 8}

print(f'Are set_c and set_d disjoint? {set_c.isdisjoint(set_d)}')
print(f'Are set_c and set_e disjoint? {set_c.isdisjoint(set_e)}')

Real-Life Use Case: Recommendation Systems

Set operations are valuable in building recommendation systems. Imagine recommending products to users based on their past purchases. If you have two sets representing the products bought by two different users, you can use the intersection operation to find products bought by both. Then, you can recommend other products bought by one user to the other based on their similar buying habits. This helps in creating personalized recommendations.

Best Practices

When performing set operations, prioritize code readability. While operator symbols (|, &, -, ^) are concise, using methods like .union() or .intersection() can enhance understanding, especially in complex code. Choose the style that best suits your team's conventions and maintain consistency.

When to use them

Use set operations when you need to deal with unique items, perform mathematical set operations, remove duplicates or test membership efficiently. Sets are very performant with the in operator because look up in sets use hash tables, it's O(1) complexity.

Memory footprint

Sets usually have a larger memory footprint than lists or tuples because of their hash table implementation and the need to store unique elements. However, the speed benefits of set operations often outweigh the memory costs, especially with larger datasets. If memory usage is a critical constraint, consider alternatives like generators or custom data structures with limited functionality. Always profile your code to assess memory usage in realistic scenarios.

FAQ

  • Can sets contain duplicate elements?

    No, sets in Python only store unique elements. If you try to add a duplicate element, it will simply be ignored.
  • Are set operations in-place?

    No, basic set operations like union, intersection, difference, and symmetric difference return new sets. The original sets are not modified. However, there are in-place versions like set1.update(set2) (equivalent to set1 |= set2) that modify set1 directly.
  • What happens if I try to add an unhashable object to a set?

    Python sets require their elements to be hashable (immutable). Attempting to add an unhashable object, like a list, will raise a TypeError. Tuples, strings, and numbers are hashable and can be added to sets.
  • What is the time complexity of set operations?

    Most set operations like union, intersection, difference, and symmetric difference have an average time complexity of O(min(len(set1), len(set2))) because they involve iterating over the smaller of the two sets. Membership tests (in operator) have an average time complexity of O(1) due to the use of hash tables.