Part 1 — Tuples
A tuple is an ordered, immutable sequence. Once you create a tuple, you cannot add, remove, or reassign its elements. This makes tuples ideal for data that should remain constant throughout the life of a program — coordinates, database rows, configuration values, and function return values.
coordinates = (10, 20)
colors = ("red", "green", "blue")
print(type(coordinates)) # <class 'tuple'>
Why Should You Care About Tuples?
Beginners often wonder why tuples exist when lists can do everything tuples do and more. Here are the key reasons:
- Data integrity — immutability guarantees that the data will not be accidentally modified.
- Performance — tuples use less memory than lists and are faster to create and access.
- Hashable — tuples (of hashable elements) can be used as dictionary keys and set members, whereas lists cannot.
- Signal intent — using a tuple tells other developers "this collection is not meant to change."
- Safe defaults — passing a tuple to a function ensures the caller's data cannot be mutated.
Creating Tuples
Python offers several ways to create tuples. Understanding all of them helps you recognise tuples when you encounter them in other people's code.
1. Parentheses (Standard Syntax)
fruits = ("apple", "banana", "cherry")
point = (3.5, 7.2)
mixed = (1, "hello", True, 3.14)
2. Without Parentheses (Tuple Packing)
Python actually creates a tuple whenever you write a comma-separated sequence of values — the parentheses are optional in most contexts.
# These two lines create identical tuples
a = 1, 2, 3
b = (1, 2, 3)
print(a == b) # True
print(type(a)) # <class 'tuple'>
This is called tuple packing — Python "packs" the values into a tuple automatically.
3. Single-Element Tuple (The Trailing Comma)
This is one of the most common gotchas. A single value in parentheses is not a tuple — it is just that value. You need a trailing comma.
# NOT a tuple — just an integer in parentheses
not_a_tuple = (42)
print(type(not_a_tuple)) # <class 'int'>
# THIS is a single-element tuple
single = (42,)
print(type(single)) # <class 'tuple'>
print(len(single)) # 1
# Without parentheses — the comma is what matters
also_single = 42,
print(type(also_single)) # <class 'tuple'>
4. The tuple() Constructor
Convert any iterable into a tuple.
# From a list
from_list = tuple([1, 2, 3])
print(from_list) # (1, 2, 3)
# From a string (each character becomes an element)
from_string = tuple("Python")
print(from_string) # ('P', 'y', 't', 'h', 'o', 'n')
# From a range
from_range = tuple(range(5))
print(from_range) # (0, 1, 2, 3, 4)
# From a set (order not guaranteed from the set)
from_set = tuple({3, 1, 2})
print(from_set) # order may vary
# From a generator expression
squares = tuple(x ** 2 for x in range(6))
print(squares) # (0, 1, 4, 9, 16, 25)
5. Empty Tuple
empty_a = ()
empty_b = tuple()
print(empty_a == empty_b) # True
print(len(empty_a)) # 0
6. Repeating Elements
Like lists, you can use the * operator to repeat a tuple.
zeros = (0,) * 5
print(zeros) # (0, 0, 0, 0, 0)
pattern = (True, False) * 3
print(pattern) # (True, False, True, False, True, False)
7. Concatenation
You can combine tuples with +. This creates a new tuple (the originals remain unchanged).
first = (1, 2, 3)
second = (4, 5, 6)
combined = first + second
print(combined) # (1, 2, 3, 4, 5, 6)
print(first) # (1, 2, 3) — unchanged
Accessing Tuple Elements
Positive and Negative Indexing
Tuples are zero-indexed, exactly like lists.
languages = ("Python", "Java", "C++", "JavaScript", "Go")
# Positive indexing (left to right, starting at 0)
print(languages[0]) # Python
print(languages[2]) # C++
print(languages[4]) # Go
# Negative indexing (right to left, starting at -1)
print(languages[-1]) # Go
print(languages[-3]) # C++
print(languages[-5]) # Python
IndexError
Accessing an out-of-range index raises an IndexError, just as with lists.
point = (10, 20, 30)
# print(point[5]) # IndexError: tuple index out of range
Slicing Tuples
Slicing works identically to list slicing. The result is always a new tuple.
nums = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
print(nums[2:5]) # (2, 3, 4)
print(nums[:4]) # (0, 1, 2, 3)
print(nums[6:]) # (6, 7, 8, 9)
print(nums[::2]) # (0, 2, 4, 6, 8) — every second element
print(nums[1::2]) # (1, 3, 5, 7, 9) — odd-indexed elements
print(nums[::-1]) # (9, 8, 7, 6, 5, 4, 3, 2, 1, 0) — reversed
print(nums[7:2:-1]) # (7, 6, 5, 4, 3) — backward from 7 to 3
Key point: Slicing never raises an
IndexError. Out-of-range slice boundaries are silently clamped.
short = (1, 2, 3)
print(short[0:100]) # (1, 2, 3) — no error
print(short[50:]) # () — empty tuple, no error
Tuple Unpacking
Tuple unpacking (also called destructuring) assigns each element of a tuple to a separate variable in a single statement. This is one of the most elegant features in Python.
Basic Unpacking
person = ("Priya", 25, "Mumbai")
name, age, city = person
print(name) # Priya
print(age) # 25
print(city) # Mumbai
The number of variables on the left must match the number of elements in the tuple, otherwise Python raises a ValueError.
# ValueError: not enough values to unpack
# a, b = (1, 2, 3)
# ValueError: too many values to unpack
# a, b, c, d = (1, 2, 3)
Swapping Variables
Tuple unpacking makes variable swapping a one-liner — no temporary variable needed.
a, b = 1, 2
a, b = b, a
print(a, b) # 2 1
# Works with more than two variables
x, y, z = 1, 2, 3
x, y, z = z, x, y
print(x, y, z) # 3 1 2
Starred Unpacking with *rest
When you don't know the exact length of a tuple, or you only care about certain positions, use the * operator to collect "the rest" into a list.
numbers = (1, 2, 3, 4, 5, 6, 7)
first, *middle, last = numbers
print(first) # 1
print(middle) # [2, 3, 4, 5, 6] — note: this is a LIST
print(last) # 7
# Grab only the first two
first, second, *rest = numbers
print(first) # 1
print(second) # 2
print(rest) # [3, 4, 5, 6, 7]
# Grab only the last two
*rest, second_last, last = numbers
print(rest) # [1, 2, 3, 4, 5]
print(second_last) # 6
print(last) # 7
Important: The starred variable always becomes a list, even if it captures zero or one element.
a, *b, c = (1, 2)
print(a) # 1
print(b) # [] — empty list
print(c) # 2
Ignoring Values with Underscore
By convention, _ is used as a "throwaway" variable for values you don't need.
record = ("Priya", 25, "Mumbai", "priya@example.com")
name, _, city, _ = record
print(name) # Priya
print(city) # Mumbai
# Combine with starred unpacking
name, *_ = record
print(name) # Priya — only the name, ignoring everything else
Unpacking in Loops
Tuple unpacking is extremely useful when iterating over sequences of tuples.
students = [
("Alice", 88),
("Bob", 72),
("Charlie", 95),
]
for name, score in students:
print(f"{name}: {score}")
# With enumerate
for index, (name, score) in enumerate(students, start=1):
print(f"#{index} {name} scored {score}")
Named Tuples
Regular tuples access elements by numeric index, which hurts readability. Named tuples let you access elements by name while retaining all the benefits of tuples (immutability, hashability, low memory).
Creating Named Tuples with collections.namedtuple
from collections import namedtuple
# Define a named tuple type
Point = namedtuple("Point", ["x", "y"])
# Create instances
p1 = Point(3, 7)
p2 = Point(x=10, y=20)
# Access by name (preferred — more readable)
print(p1.x) # 3
print(p1.y) # 7
# Access by index (still works)
print(p2[0]) # 10
print(p2[1]) # 20
# Unpack just like a regular tuple
x, y = p1
print(x, y) # 3 7
Named Tuples Are Still Immutable
from collections import namedtuple
Color = namedtuple("Color", "red green blue") # string syntax also works
c = Color(255, 128, 0)
# c.red = 200 # AttributeError: can't set attribute
# To "modify", create a new named tuple with _replace()
c2 = c._replace(red=200)
print(c2) # Color(red=200, green=128, blue=0)
print(c) # Color(red=255, green=128, blue=0) — original unchanged
Named Tuples with Default Values
from collections import namedtuple
# Defaults apply to the rightmost fields
Student = namedtuple("Student", ["name", "age", "grade"], defaults=["A"])
s1 = Student("Priya", 22) # grade defaults to "A"
s2 = Student("Rahul", 24, "B+") # explicit grade
print(s1) # Student(name='Priya', age=22, grade='A')
print(s2) # Student(name='Rahul', age=24, grade='B+')
Converting Named Tuples to Dictionaries
from collections import namedtuple
Employee = namedtuple("Employee", ["name", "department", "salary"])
emp = Employee("Alice", "Engineering", 95000)
# Convert to an ordered dictionary
emp_dict = emp._asdict()
print(emp_dict)
# {'name': 'Alice', 'department': 'Engineering', 'salary': 95000}
Modern Alternative: typing.NamedTuple
Python 3.6+ offers a class-based syntax with type hints.
from typing import NamedTuple
class Coordinate(NamedTuple):
latitude: float
longitude: float
label: str = "Unknown"
loc = Coordinate(28.6139, 77.2090, "New Delhi")
print(loc.label) # New Delhi
print(loc.latitude) # 28.6139
Tuple Methods
Tuples have only two built-in methods, because they are immutable.
| Method | Description | Returns |
|---|---|---|
count(x) | Number of times x appears in the tuple | int |
index(x, start, end) | Index of first occurrence of x | int (raises ValueError if missing) |
data = (10, 20, 30, 20, 40, 20, 50)
# count — how many times does 20 appear?
print(data.count(20)) # 3
print(data.count(99)) # 0
# index — where is the first 20?
print(data.index(20)) # 1
print(data.index(20, 2)) # 3 — search starting from index 2
print(data.index(20, 4)) # 5 — search starting from index 4
# index raises ValueError if not found
# data.index(99) # ValueError: tuple.index(x): x not in tuple
Built-in Functions That Work with Tuples
Although tuples have only two methods, many built-in functions accept tuples.
| Function | Description | Example |
|---|---|---|
len(t) | Number of elements | len((1,2,3)) returns 3 |
min(t) | Smallest element | min((3,1,2)) returns 1 |
max(t) | Largest element | max((3,1,2)) returns 3 |
sum(t) | Sum of all elements | sum((1,2,3)) returns 6 |
sorted(t) | New sorted list from tuple | sorted((3,1,2)) returns [1,2,3] |
reversed(t) | Reverse iterator | tuple(reversed((1,2,3))) returns (3,2,1) |
any(t) | True if any element is truthy | any((0, False, 1)) returns True |
all(t) | True if all elements are truthy | all((1, True, "hi")) returns True |
enumerate(t) | Iterator of (index, item) pairs | See iteration section |
zip(a, b) | Pair elements from two tuples | See iteration section |
scores = (78, 92, 85, 63, 97, 88)
print(len(scores)) # 6
print(min(scores)) # 63
print(max(scores)) # 97
print(sum(scores)) # 503
print(sorted(scores)) # [63, 78, 85, 88, 92, 97] — returns a LIST
Immutability — What It Really Means
Tuples are immutable, which means you cannot reassign, add, or remove elements.
t = (1, 2, 3)
# ALL of these will raise TypeError:
# t[0] = 99
# t.append(4)
# del t[0]
Immutability Does NOT Mean the Contents Cannot Change
This is a crucial subtlety. If a tuple contains a mutable object (like a list or dictionary), you can modify that object in place. The tuple itself doesn't change — it still holds the same reference — but the object the reference points to changes.
# A tuple containing a list
t = (1, [2, 3], 4)
# You CANNOT replace the list with a different object
# t[1] = [20, 30] # TypeError
# But you CAN modify the list IN PLACE
t[1].append(99)
print(t) # (1, [2, 3, 99], 4)
t[1][0] = 200
print(t) # (1, [200, 3, 99], 4)
This is because immutability applies to the references stored in the tuple, not to the objects those references point to. The tuple still holds the same list object — it's the list's contents that changed.
Best practice: If you want truly frozen data, make sure every element is also immutable (use tuples, strings, numbers, frozensets — not lists or dicts).
Hashability Depends on Contents
A tuple is hashable only if all its elements are hashable. This matters when you try to use a tuple as a dictionary key or add it to a set.
# Hashable tuple — all elements are immutable
hashable = (1, "hello", (2, 3))
print(hash(hashable)) # works fine
my_dict = {hashable: "value"} # works as a dict key
# NOT hashable — contains a mutable list
unhashable = (1, [2, 3])
# hash(unhashable) # TypeError: unhashable type: 'list'
# my_dict = {unhashable: "value"} # TypeError
Tuple Use Cases
1. Dictionary Keys
Lists cannot be dictionary keys because they are mutable. Tuples can.
# Using (row, col) tuples as keys for a sparse grid
grid = {}
grid[(0, 0)] = "start"
grid[(2, 5)] = "treasure"
grid[(4, 4)] = "exit"
print(grid[(2, 5)]) # treasure
# Counting occurrences of coordinate pairs
from collections import Counter
clicks = [(100, 200), (150, 300), (100, 200), (100, 200)]
click_counts = Counter(clicks)
print(click_counts)
# Counter({(100, 200): 3, (150, 300): 1})
2. Function Return Values
Functions frequently return multiple values as tuples.
def divide(a, b):
"""Return both the quotient and remainder."""
quotient = a // b
remainder = a % b
return quotient, remainder # returns a tuple
q, r = divide(17, 5)
print(f"17 / 5 = {q} remainder {r}") # 17 / 5 = 3 remainder 2
# You can also capture the result as a single tuple
result = divide(17, 5)
print(result) # (3, 2)
print(result[0]) # 3
3. Data Integrity — Read-Only Records
Use tuples when data should not be modified after creation.
# Database-style records
employees = [
("E001", "Alice", "Engineering", 95000),
("E002", "Bob", "Marketing", 72000),
("E003", "Charlie", "Engineering", 88000),
]
# Safe to iterate — nobody can accidentally modify a record
for emp_id, name, dept, salary in employees:
print(f"{emp_id}: {name} ({dept}) — ${salary:,}")
4. Tuples as Set Elements
Because tuples are hashable, you can put them inside sets.
# Unique coordinate pairs
visited = set()
visited.add((0, 0))
visited.add((1, 2))
visited.add((0, 0)) # duplicate — ignored
print(visited) # {(0, 0), (1, 2)}
# Check if a coordinate has been visited
print((1, 2) in visited) # True
print((3, 4) in visited) # False
5. String Formatting
The % formatting operator expects a tuple of values.
name = "Priya"
age = 25
print("Name: %s, Age: %d" % (name, age))
# Name: Priya, Age: 25
When to Use Tuples Over Lists
| Situation | Use a Tuple | Use a List |
|---|---|---|
| Data should not change | Yes | No |
| Need to use as dict key | Yes | No |
| Need to use as set member | Yes | No |
| Returning multiple values from a function | Yes | No |
| Fixed collection of heterogeneous items (like a record) | Yes | No |
| Collection will grow/shrink | No | Yes |
Need append(), sort(), remove() etc. | No | Yes |
| Element order matters and items may be modified | No | Yes |
Rule of thumb: If the collection represents a fixed record (name, age, score), use a tuple. If the collection represents a dynamic group of similar items (list of names, list of scores), use a list.
Part 2 — Sets
A set is an unordered collection of unique elements. Sets automatically discard duplicates and provide O(1) membership testing. They are Python's implementation of the mathematical set concept.
fruits = {"apple", "banana", "cherry"}
print(type(fruits)) # <class 'set'>
Key Characteristics
- Unordered — elements have no defined position; you cannot index or slice a set.
- Unique — duplicate values are automatically removed.
- Mutable — you can add and remove elements (unlike tuples).
- Elements must be hashable — you can store integers, strings, and tuples, but not lists, dicts, or other sets.
Creating Sets
1. Curly Braces
colors = {"red", "green", "blue"}
numbers = {1, 2, 3, 2, 1} # duplicates removed
print(numbers) # {1, 2, 3}
Important: An empty
{}creates a dictionary, not a set. Useset()for an empty set.
empty_dict = {}
empty_set = set()
print(type(empty_dict)) # <class 'dict'>
print(type(empty_set)) # <class 'set'>
2. The set() Constructor
# From a list (removes duplicates)
from_list = set([1, 2, 3, 2, 1])
print(from_list) # {1, 2, 3}
# From a string (each character becomes an element)
from_string = set("mississippi")
print(from_string) # {'m', 'i', 's', 'p'} — unique characters only
# From a tuple
from_tuple = set((10, 20, 30, 20))
print(from_tuple) # {10, 20, 30}
# From a range
from_range = set(range(5))
print(from_range) # {0, 1, 2, 3, 4}
3. Set Comprehensions
Set comprehensions follow the same syntax as list comprehensions but use curly braces.
# Squares of numbers 1 through 10
squares = {x ** 2 for x in range(1, 11)}
print(squares) # {1, 4, 9, 16, 25, 36, 49, 64, 81, 100}
# Unique word lengths from a sentence
sentence = "the quick brown fox jumps over the lazy dog"
word_lengths = {len(word) for word in sentence.split()}
print(word_lengths) # {3, 4, 5}
# With a condition — only even squares
even_squares = {x ** 2 for x in range(1, 11) if x % 2 == 0}
print(even_squares) # {4, 16, 36, 64, 100}
# Unique first letters (case-insensitive)
names = ["Alice", "Bob", "anna", "Charlie", "bob"]
initials = {name[0].upper() for name in names}
print(initials) # {'A', 'B', 'C'}
Adding and Removing Elements
add() — Add a Single Element
skills = {"Python", "SQL"}
skills.add("Tableau")
print(skills) # {'Python', 'SQL', 'Tableau'}
# Adding an element that already exists does nothing
skills.add("Python")
print(skills) # {'Python', 'SQL', 'Tableau'} — unchanged
update() — Add Multiple Elements
update() accepts any iterable (list, tuple, set, string, etc.).
skills = {"Python"}
skills.update(["SQL", "Tableau"])
skills.update(("R", "Excel"))
skills.update({"Spark"})
print(skills)
# {'Python', 'SQL', 'Tableau', 'R', 'Excel', 'Spark'}
# Updating with a string adds EACH CHARACTER
letters = {"a", "b"}
letters.update("cd")
print(letters) # {'a', 'b', 'c', 'd'}
remove() — Remove (Raises Error if Missing)
skills = {"Python", "SQL", "Tableau"}
skills.remove("SQL")
print(skills) # {'Python', 'Tableau'}
# Raises KeyError if the element is not found
# skills.remove("Java") # KeyError: 'Java'
discard() — Remove (No Error if Missing)
skills = {"Python", "SQL", "Tableau"}
skills.discard("SQL")
print(skills) # {'Python', 'Tableau'}
# No error if the element doesn't exist
skills.discard("Java") # silently does nothing
print(skills) # {'Python', 'Tableau'}
pop() — Remove and Return an Arbitrary Element
Since sets are unordered, you cannot predict which element will be removed.
numbers = {10, 20, 30}
removed = numbers.pop()
print(f"Removed: {removed}") # could be 10, 20, or 30
print(numbers) # the remaining two elements
# Raises KeyError on empty set
# set().pop() # KeyError: 'pop from an empty set'
clear() — Remove All Elements
items = {1, 2, 3}
items.clear()
print(items) # set()
Set Operations — Mathematical Set Theory
This is where sets truly shine. Python supports all standard set operations, both as operators and methods. The method syntax is more flexible because it accepts any iterable as an argument, while the operator syntax requires both operands to be sets.
Union — All Elements from Both Sets
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
# Operator syntax
print(a | b) # {1, 2, 3, 4, 5, 6}
# Method syntax (accepts any iterable)
print(a.union(b)) # {1, 2, 3, 4, 5, 6}
print(a.union([5, 6, 7])) # {1, 2, 3, 4, 5, 6, 7}
# Multiple sets at once
c = {7, 8}
print(a | b | c) # {1, 2, 3, 4, 5, 6, 7, 8}
print(a.union(b, c)) # {1, 2, 3, 4, 5, 6, 7, 8}
# In-place union (modifies a)
a_copy = a.copy()
a_copy |= b
print(a_copy) # {1, 2, 3, 4, 5, 6}
# Or equivalently: a_copy.update(b)
Intersection — Elements Common to Both Sets
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
# Operator syntax
print(a & b) # {3, 4}
# Method syntax
print(a.intersection(b)) # {3, 4}
# Multiple sets
c = {3, 4, 7, 8}
print(a & b & c) # {3, 4}
print(a.intersection(b, c)) # {3, 4}
# In-place intersection
a_copy = a.copy()
a_copy &= b
print(a_copy) # {3, 4}
# Or equivalently: a_copy.intersection_update(b)
Difference — Elements in First Set but Not in Second
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
# Operator syntax
print(a - b) # {1, 2} — in a but NOT in b
print(b - a) # {5, 6} — in b but NOT in a
# Method syntax
print(a.difference(b)) # {1, 2}
# Multiple sets — remove elements found in ANY of b, c
c = {2, 7}
print(a - b - c) # {1}
print(a.difference(b, c)) # {1}
# In-place difference
a_copy = a.copy()
a_copy -= b
print(a_copy) # {1, 2}
# Or equivalently: a_copy.difference_update(b)
Symmetric Difference — Elements in Either Set but Not Both
This is the opposite of intersection — it gives you elements that are unique to each set.
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
# Operator syntax
print(a ^ b) # {1, 2, 5, 6}
# Method syntax
print(a.symmetric_difference(b)) # {1, 2, 5, 6}
# In-place symmetric difference
a_copy = a.copy()
a_copy ^= b
print(a_copy) # {1, 2, 5, 6}
# Or equivalently: a_copy.symmetric_difference_update(b)
Note: Symmetric difference with multiple sets using
^is chained pairwise:a ^ b ^ cfirst computesa ^ b, then XORs the result withc. This is not the same as "elements appearing in exactly one of the three sets."
Subset and Superset
a = {1, 2, 3}
b = {1, 2, 3, 4, 5}
# Is a a subset of b? (every element of a is in b)
print(a <= b) # True
print(a.issubset(b)) # True
# Is a a proper subset of b? (subset and not equal)
print(a < b) # True
# Is b a superset of a? (b contains all elements of a)
print(b >= a) # True
print(b.issuperset(a)) # True
# Is b a proper superset?
print(b > a) # True
# Equal sets
c = {3, 2, 1}
print(a == c) # True — order doesn't matter
print(a <= c) # True — a set is a subset of itself
print(a < c) # False — not a PROPER subset (they're equal)
Disjoint Sets — No Elements in Common
evens = {2, 4, 6, 8}
odds = {1, 3, 5, 7}
primes = {2, 3, 5, 7}
print(evens.isdisjoint(odds)) # True — no overlap
print(evens.isdisjoint(primes)) # False — they share {2}
print(odds.isdisjoint(primes)) # False — they share {3, 5, 7}
Set Operations Summary Table
| Operation | Operator | Method | In-Place Method | Description |
|---|---|---|---|---|
| Union | a | b | a.union(b) | a.update(b) or a |= b | All elements from both |
| Intersection | a & b | a.intersection(b) | a.intersection_update(b) or a &= b | Common elements |
| Difference | a - b | a.difference(b) | a.difference_update(b) or a -= b | In a but not b |
| Symmetric Diff | a ^ b | a.symmetric_difference(b) | a.symmetric_difference_update(b) or a ^= b | In either but not both |
| Subset | a <= b | a.issubset(b) | — | All of a in b? |
| Proper Subset | a < b | — | — | Subset and not equal? |
| Superset | a >= b | a.issuperset(b) | — | All of b in a? |
| Proper Superset | a > b | — | — | Superset and not equal? |
| Disjoint | — | a.isdisjoint(b) | — | No common elements? |
Membership Testing — O(1) Performance
One of the most important practical reasons to use sets is fast membership testing. Checking x in my_set runs in O(1) average time (constant time, regardless of set size), compared to O(n) for lists and tuples.
import time
# Create a large list and set with the same data
data = list(range(10_000_000)) # 10 million integers
data_set = set(data)
target = 9_999_999 # worst case for list (last element)
# List lookup — O(n)
start = time.time()
found_list = target in data
list_time = time.time() - start
# Set lookup — O(1)
start = time.time()
found_set = target in data_set
set_time = time.time() - start
print(f"List: {list_time:.6f}s") # much slower
print(f"Set: {set_time:.6f}s") # nearly instant
Practical rule: If you need to check membership repeatedly (especially in a loop), convert your data to a set first. The one-time cost of building the set is quickly recovered.
# BAD — O(n) per lookup, O(n * m) total
allowed_list = ["admin", "editor", "viewer", "moderator", "analyst"]
users = [("Alice", "admin"), ("Bob", "hacker"), ("Charlie", "viewer")]
for name, role in users:
if role in allowed_list: # O(n) each time
print(f"{name}: allowed")
# GOOD — O(1) per lookup
allowed_set = set(allowed_list)
for name, role in users:
if role in allowed_set: # O(1) each time
print(f"{name}: allowed")
Frozen Sets
A frozenset is an immutable version of a set. It supports all set operations (union, intersection, etc.) but cannot be modified after creation — no add(), remove(), discard(), pop(), or clear().
# Create a frozenset
fs = frozenset([1, 2, 3, 4, 5])
print(fs) # frozenset({1, 2, 3, 4, 5})
print(type(fs)) # <class 'frozenset'>
# All read-only operations work
print(3 in fs) # True
print(fs | {6, 7}) # frozenset({1, 2, 3, 4, 5, 6, 7})
print(fs & {3, 4, 5, 6}) # frozenset({3, 4, 5})
# Modification operations raise AttributeError
# fs.add(6) # AttributeError: 'frozenset' object has no attribute 'add'
# fs.remove(1) # AttributeError
Why Use Frozensets?
Since frozensets are hashable, they can be used where regular sets cannot:
# 1. As dictionary keys
permissions = {
frozenset({"read", "write"}): "Editor",
frozenset({"read"}): "Viewer",
frozenset({"read", "write", "admin"}): "Admin",
}
user_perms = frozenset({"read", "write"})
print(permissions[user_perms]) # Editor
# 2. As elements of another set (set of sets)
groups = set()
groups.add(frozenset({1, 2, 3}))
groups.add(frozenset({4, 5, 6}))
groups.add(frozenset({1, 2, 3})) # duplicate — ignored
print(groups) # {frozenset({1, 2, 3}), frozenset({4, 5, 6})}
# 3. As a safe default in function parameters
def process(items=frozenset()):
"""Default is a frozenset — no risk of mutable default argument bug."""
for item in items:
print(item)
Common Set Patterns
1. Removing Duplicates from a List
# Simple but DOES NOT preserve order
names = ["Priya", "Rahul", "Priya", "Ananya", "Rahul", "Priya"]
unique = list(set(names))
print(unique) # order may vary
2. Removing Duplicates While Preserving Order
# Method 1: dict.fromkeys() — Python 3.7+
names = ["Priya", "Rahul", "Priya", "Ananya", "Rahul", "Priya"]
unique_ordered = list(dict.fromkeys(names))
print(unique_ordered) # ['Priya', 'Rahul', 'Ananya']
# Method 2: Manual loop with a "seen" set
def deduplicate(items):
"""Remove duplicates while preserving insertion order."""
seen = set()
result = []
for item in items:
if item not in seen:
seen.add(item)
result.append(item)
return result
print(deduplicate(names)) # ['Priya', 'Rahul', 'Ananya']
# Method 3: Using a generator (memory-efficient for large data)
def unique_gen(items):
seen = set()
for item in items:
if item not in seen:
seen.add(item)
yield item
print(list(unique_gen(names))) # ['Priya', 'Rahul', 'Ananya']
3. Finding Common Elements Between Collections
# Students enrolled in different courses
python_students = {"Alice", "Bob", "Charlie", "Diana", "Eve"}
sql_students = {"Bob", "Charlie", "Frank", "Grace"}
tableau_students = {"Charlie", "Diana", "Grace", "Helen"}
# Students in ALL three courses
all_three = python_students & sql_students & tableau_students
print(f"All three courses: {all_three}") # {'Charlie'}
# Students in Python OR SQL (at least one)
python_or_sql = python_students | sql_students
print(f"Python or SQL: {python_or_sql}")
# Students in Python but NOT SQL
python_only = python_students - sql_students
print(f"Python only: {python_only}") # {'Alice', 'Diana', 'Eve'}
# Students in exactly one of the three courses
# (those in only one = total minus those in any pair)
in_exactly_one = (
(python_students - sql_students - tableau_students) |
(sql_students - python_students - tableau_students) |
(tableau_students - python_students - sql_students)
)
print(f"Exactly one course: {in_exactly_one}")
4. Data Validation with Sets
# Validate that user input contains only allowed characters
allowed_chars = set("abcdefghijklmnopqrstuvwxyz0123456789_")
def is_valid_username(username):
"""Check if username contains only allowed characters."""
return set(username.lower()).issubset(allowed_chars)
print(is_valid_username("priya_123")) # True
print(is_valid_username("bob@email")) # False — @ not allowed
print(is_valid_username("hello world")) # False — space not allowed
5. Set Algebra for Data Analysis
# Comparing two time periods
jan_customers = {"Alice", "Bob", "Charlie", "Diana", "Eve"}
feb_customers = {"Bob", "Diana", "Frank", "Grace", "Helen"}
# New customers in February (not seen in January)
new_customers = feb_customers - jan_customers
print(f"New in Feb: {new_customers}") # {'Frank', 'Grace', 'Helen'}
# Churned customers (were in Jan, gone in Feb)
churned = jan_customers - feb_customers
print(f"Churned: {churned}") # {'Alice', 'Charlie', 'Eve'}
# Retained customers (in both months)
retained = jan_customers & feb_customers
print(f"Retained: {retained}") # {'Bob', 'Diana'}
# Retention rate
retention_rate = len(retained) / len(jan_customers) * 100
print(f"Retention rate: {retention_rate:.1f}%") # 40.0%
Comparison Table — List vs Tuple vs Set vs Frozenset
| Feature | list | tuple | set | frozenset |
|---|---|---|---|---|
| Syntax | [1, 2, 3] | (1, 2, 3) | {1, 2, 3} | frozenset({1, 2, 3}) |
| Ordered | Yes | Yes | No | No |
| Mutable | Yes | No | Yes | No |
| Allows duplicates | Yes | Yes | No | No |
| Indexing / slicing | Yes | Yes | No | No |
| Hashable | No | Yes* | No | Yes |
| Can be dict key | No | Yes* | No | Yes |
| Can be set element | No | Yes* | No | Yes |
in operator speed | O(n) | O(n) | O(1) | O(1) |
| Memory usage | Higher | Lower | Higher | Higher |
| Best use case | Dynamic collections | Fixed records, dict keys | Unique items, fast lookup | Immutable unique groups |
*Tuples are hashable only if all their elements are also hashable.
Quick Decision Guide
- Need to modify the collection? Use
listorset. - Need order? Use
listortuple. - Need uniqueness? Use
setorfrozenset. - Need fast membership testing? Use
setorfrozenset. - Need to use as a dict key? Use
tupleorfrozenset. - Data should never change? Use
tupleorfrozenset.
Practical Examples
Example 1: Frequency Analysis with Counter
from collections import Counter
# Analyse the frequency of words in a text
text = """
Python is a great programming language. Python is used for data science.
Data science is one of the most popular fields. Python and data science
are growing together. Python makes data analysis easy and fun.
"""
# Normalise and tokenise
words = text.lower().split()
# Remove punctuation from each word
words = [word.strip(".,!?;:") for word in words]
# Count frequencies
word_counts = Counter(words)
# Most common words
print("Top 5 most frequent words:")
for word, count in word_counts.most_common(5):
print(f" '{word}' — {count} times")
# Unique words (using a set)
unique_words = set(words)
print(f"\nTotal words: {len(words)}")
print(f"Unique words: {len(unique_words)}")
print(f"Vocabulary richness: {len(unique_words)/len(words):.2%}")
# Words that appear exactly once (hapax legomena)
hapax = {word for word, count in word_counts.items() if count == 1}
print(f"Words appearing only once: {hapax}")
Example 2: De-Duplication Pipeline
def clean_email_list(raw_emails):
"""
Clean a list of email addresses:
1. Strip whitespace
2. Convert to lowercase
3. Remove duplicates while preserving order
4. Validate format (basic check)
"""
seen = set()
cleaned = []
for email in raw_emails:
# Step 1 & 2: normalise
email = email.strip().lower()
# Step 3: skip if already seen
if email in seen:
continue
# Step 4: basic validation
if "@" not in email or "." not in email.split("@")[-1]:
print(f" Skipped invalid: {email}")
continue
seen.add(email)
cleaned.append(email)
return cleaned
# Test data
raw = [
" Alice@Example.com ",
"bob@test.com",
"alice@example.com", # duplicate (case-insensitive)
"CHARLIE@test.com",
"bob@test.com", # exact duplicate
"invalid-email", # no @
"diana@company.org",
" BOB@TEST.COM ", # duplicate with whitespace
]
result = clean_email_list(raw)
print(f"\nCleaned list ({len(result)} emails):")
for email in result:
print(f" {email}")
# Output:
# Skipped invalid: invalid-email
#
# Cleaned list (4 emails):
# alice@example.com
# bob@test.com
# charlie@test.com
# diana@company.org
Example 3: Finding Unique Visitors Across Days
# Simulated web analytics — visitor IDs for each day of the week
monday = {"user_001", "user_002", "user_003", "user_004"}
tuesday = {"user_002", "user_003", "user_005", "user_006"}
wednesday = {"user_001", "user_003", "user_006", "user_007"}
thursday = {"user_003", "user_004", "user_008"}
friday = {"user_001", "user_002", "user_005", "user_009"}
all_days = [monday, tuesday, wednesday, thursday, friday]
day_names = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
# Total unique visitors across the week
all_visitors = set()
for day in all_days:
all_visitors |= day # union
print(f"Total unique visitors: {len(all_visitors)}")
# {user_001 through user_009} = 9
# Visitors who came EVERY day
daily_visitors = monday
for day in all_days[1:]:
daily_visitors &= day # intersection
print(f"Visited every day: {daily_visitors}")
# Only user_003 was there Mon-Fri
# First-time visitors each day
seen = set()
for name, day in zip(day_names, all_days):
new_visitors = day - seen
print(f" {name}: {len(new_visitors)} new visitors — {new_visitors}")
seen |= day
# Daily unique visitor counts
print("\nDaily visitor counts:")
for name, day in zip(day_names, all_days):
print(f" {name}: {len(day)} visitors")
Example 4: Using Tuples and Sets Together
from collections import namedtuple
# Tracking student course registrations
Registration = namedtuple("Registration", ["student", "course", "semester"])
registrations = [
Registration("Alice", "Python", "Fall 2025"),
Registration("Bob", "Python", "Fall 2025"),
Registration("Alice", "SQL", "Fall 2025"),
Registration("Charlie", "Python", "Fall 2025"),
Registration("Bob", "SQL", "Fall 2025"),
Registration("Alice", "Python", "Spring 2026"), # re-registration
Registration("Diana", "Tableau", "Spring 2026"),
]
# Unique student-course pairs (ignoring semester)
unique_pairs = {(r.student, r.course) for r in registrations}
print(f"Unique student-course pairs: {len(unique_pairs)}")
for student, course in sorted(unique_pairs):
print(f" {student} -> {course}")
# All unique students
students = {r.student for r in registrations}
print(f"\nUnique students: {students}")
# All unique courses
courses = {r.course for r in registrations}
print(f"Unique courses: {courses}")
# Students per course
for course in sorted(courses):
enrolled = {r.student for r in registrations if r.course == course}
print(f" {course}: {enrolled}")
Practice Exercises
Test your understanding of tuples and sets. Try each exercise before reading the hint.
Exercise 1: Tuple Statistics
Write a function tuple_stats(t) that takes a tuple of numbers and returns a named tuple with fields minimum, maximum, total, average, and count. Do not use import statistics.
# Example:
# stats = tuple_stats((10, 20, 30, 40, 50))
# stats.minimum -> 10
# stats.maximum -> 50
# stats.total -> 150
# stats.average -> 30.0
# stats.count -> 5
Hint: Use
collections.namedtupleto define aStatstype, then compute each value with built-in functions.
Exercise 2: Symmetric Difference Without the Operator
Write a function symmetric_diff(a, b) that returns the symmetric difference of two sets without using ^, symmetric_difference(), or symmetric_difference_update(). Use only union, intersection, and difference.
# Example:
# symmetric_diff({1, 2, 3}, {2, 3, 4}) -> {1, 4}
Hint: The symmetric difference is
(A - B) | (B - A), or equivalently(A | B) - (A & B).
Exercise 3: Remove Duplicates Preserving Order (Case-Insensitive)
Write a function unique_words(text) that takes a string, splits it into words, and returns a list of unique words preserving their first occurrence order. Treat "Python" and "python" as the same word, but keep the casing of the first occurrence.
# Example:
# unique_words("Python is great and python is fun and PYTHON rocks")
# -> ['Python', 'is', 'great', 'and', 'fun', 'rocks']
Hint: Use a set to track lowercase versions of words you've already seen.
Exercise 4: Common Friends
Given a dictionary mapping person names to sets of friends, write a function common_friends(network, person_a, person_b) that returns the set of mutual friends of two people (excluding the two people themselves).
# Example:
# network = {
# "Alice": {"Bob", "Charlie", "Diana"},
# "Bob": {"Alice", "Charlie", "Eve"},
# "Charlie": {"Alice", "Bob", "Diana", "Eve"},
# }
# common_friends(network, "Alice", "Bob") -> {"Charlie"}
Hint: Use set intersection and then discard the two people from the result.
Exercise 5: Tuple-Based Sparse Matrix
Write a class SparseMatrix that stores only non-zero values using a dictionary with (row, col) tuple keys. Implement set(row, col, value), get(row, col), and non_zero_count() methods.
# Example:
# m = SparseMatrix()
# m.set(0, 0, 5)
# m.set(2, 3, 8)
# m.get(0, 0) -> 5
# m.get(1, 1) -> 0 (default for missing entries)
# m.non_zero_count() -> 2
Hint: Use a dictionary with
(row, col)tuples as keys. Thegetmethod should return0if the key is not present.
Exercise 6: Set-Based Venn Diagram Analysis
Write a function venn_analysis(set_a, set_b, label_a="A", label_b="B") that prints a complete Venn diagram analysis: elements only in A, elements only in B, elements in both, total unique elements, and whether one is a subset of the other.
# Example:
# venn_analysis({1, 2, 3, 4, 5}, {3, 4, 5, 6, 7}, "Odds", "Evens")
# Only in Odds: {1, 2}
# Only in Evens: {6, 7}
# In both: {3, 4, 5}
# Total unique: 7
# Odds subset of Evens? No
# Evens subset of Odds? No
Hint: Use difference for "only in A", difference for "only in B", intersection for "in both", and union for "total unique."
Summary
In this chapter, you learned:
Tuples:
- What tuples are — ordered, immutable sequences
- Creating tuples — parentheses, packing, single-element with trailing comma,
tuple()constructor, repetition, concatenation - Accessing elements — positive indexing, negative indexing, slicing with
start:stop:step - Tuple unpacking — basic destructuring, swapping variables, starred
*restunpacking, ignoring values with_ - Named tuples —
collections.namedtuple,typing.NamedTuple,_replace(),_asdict(), defaults - Tuple methods —
count()andindex(), plus built-in functions (len,min,max,sum,sorted) - Immutability nuances — references vs objects, mutable elements inside tuples, hashability rules
- Use cases — dictionary keys, function return values, data integrity, set elements, string formatting
Sets:
- What sets are — unordered collections of unique, hashable elements
- Creating sets — curly braces,
set()constructor, set comprehensions - Modifying sets —
add(),update(),remove(),discard(),pop(),clear() - Mathematical operations — union (
|), intersection (&), difference (-), symmetric difference (^) — both operator and method syntax - Subset and superset testing —
<=,<,>=,>,issubset(),issuperset(),isdisjoint() - Frozen sets — immutable sets that can be dict keys and set elements
- O(1) membership testing — why
inis dramatically faster with sets than lists - Common patterns — deduplication (with and without order), finding common elements, data validation, customer analysis
Choosing the right data structure:
- List — ordered, mutable, allows duplicates (dynamic collections)
- Tuple — ordered, immutable, allows duplicates (fixed records, dict keys)
- Set — unordered, mutable, unique elements (fast lookup, deduplication)
- Frozenset — unordered, immutable, unique elements (hashable set)
Next up: Dictionaries — key-value pair mappings for structured data storage and retrieval.