What is a Dictionary?
A dictionary is Python's built-in mapping type. It stores data as key-value pairs, letting you look up a value instantly when you know its key — much like how a real-world dictionary lets you look up a definition when you know the word.
Key characteristics of a Python dictionary:
- Key-value pairs — every entry consists of a unique key mapped to a value.
- Fast lookup — retrieving a value by key is O(1) on average.
- Mutable — you can add, change, and remove entries after creation.
- Insertion-ordered — since Python 3.7, dictionaries preserve the order in which keys were inserted (this was an implementation detail in 3.6 and became a language guarantee in 3.7).
- Keys must be hashable — strings, numbers, tuples of immutables, and frozensets can be keys. Lists, sets, and other dicts cannot.
- Values can be anything — integers, strings, lists, other dicts, functions, objects — no restrictions.
student = {
"name": "Priya",
"age": 22,
"course": "Data Science",
"active": True
}
print(type(student)) # <class 'dict'>
print(len(student)) # 4
Why Keys Must Be Hashable
Python dictionaries are implemented as hash tables. When you store a key-value pair, Python computes a hash of the key to decide where to store the pair internally. This is what makes lookups so fast — O(1) on average.
For hashing to work correctly, a key must be immutable (its hash must never change). This is why lists, sets, and dictionaries cannot be used as keys, but strings, numbers, booleans, and tuples (as long as they contain only immutable elements) can.
# Valid keys
valid = {
"name": "Priya", # string
42: "the answer", # integer
3.14: "pi", # float
True: "boolean", # bool (note: True == 1, so this overwrites key 1)
(1, 2): "tuple key", # tuple of immutables
}
# Invalid keys — will raise TypeError
# bad = {[1, 2]: "list key"} # TypeError: unhashable type: 'list'
# bad = {{1, 2}: "set key"} # TypeError: unhashable type: 'set'
# bad = {{"a": 1}: "dict key"} # TypeError: unhashable type: 'dict'
Note on booleans:
Trueis equal to1andFalseis equal to0in Python. If you use bothTrueand1as keys, they are treated as the same key, and the second assignment overwrites the first.
Insertion Order Guarantee (Python 3.7+)
Before Python 3.7, dictionaries had no guaranteed iteration order. If you needed ordered keys, you had to use collections.OrderedDict. Since Python 3.7, the built-in dict officially preserves insertion order:
timeline = {}
timeline["2020"] = "Started learning Python"
timeline["2021"] = "Built first project"
timeline["2022"] = "Got first job"
timeline["2023"] = "Led a team"
for year, event in timeline.items():
print(f"{year}: {event}")
# Output is always in insertion order:
# 2020: Started learning Python
# 2021: Built first project
# 2022: Got first job
# 2023: Led a team
Creating Dictionaries
Python offers several ways to create dictionaries, each suited to different situations.
1. Literal Syntax with Curly Braces
The most common and readable way:
# String keys (most common)
person = {"name": "Priya", "age": 22, "city": "Mumbai"}
# Mixed key types
config = {
"debug": True,
"port": 8080,
"max_retries": 3,
}
# Empty dictionary
empty = {}
print(type(empty)) # <class 'dict'>
Trailing commas are allowed and encouraged in multi-line dicts. They make version control diffs cleaner when adding new keys.
2. The dict() Constructor
The dict() constructor accepts keyword arguments, another dict, or an iterable of key-value pairs:
# Using keyword arguments (keys must be valid Python identifiers)
person = dict(name="Priya", age=22, city="Mumbai")
print(person) # {'name': 'Priya', 'age': 22, 'city': 'Mumbai'}
# From another dictionary (creates a shallow copy)
original = {"a": 1, "b": 2}
copy = dict(original)
print(copy) # {'a': 1, 'b': 2}
# From a list of (key, value) tuples
pairs = [("name", "Priya"), ("age", 22), ("city", "Mumbai")]
person = dict(pairs)
print(person) # {'name': 'Priya', 'age': 22, 'city': 'Mumbai'}
3. Using dict.fromkeys()
dict.fromkeys() creates a dictionary from a sequence of keys, all with the same default value:
# All keys get the same default value
subjects = dict.fromkeys(["Python", "SQL", "Tableau"], 0)
print(subjects) # {'Python': 0, 'SQL': 0, 'Tableau': 0}
# Default is None if no value is provided
fields = dict.fromkeys(["name", "age", "email"])
print(fields) # {'name': None, 'age': None, 'email': None}
Caution: If the default value is mutable (like a list), all keys share the same object. Use a dict comprehension instead if you need independent mutable defaults.
# WRONG — all keys share the same list
bad = dict.fromkeys(["a", "b", "c"], [])
bad["a"].append(1)
print(bad) # {'a': [1], 'b': [1], 'c': [1]} — all changed!
# CORRECT — each key gets its own list
good = {key: [] for key in ["a", "b", "c"]}
good["a"].append(1)
print(good) # {'a': [1], 'b': [], 'c': []} — only 'a' changed
4. Zipping Two Lists Together
A very common pattern when you have parallel lists of keys and values:
keys = ["name", "age", "city"]
values = ["Priya", 22, "Mumbai"]
person = dict(zip(keys, values))
print(person) # {'name': 'Priya', 'age': 22, 'city': 'Mumbai'}
# This is equivalent to the dict comprehension:
person = {k: v for k, v in zip(keys, values)}
Note: If the two lists have different lengths,
zip()stops at the shorter one. Useitertools.zip_longest()if you need to handle unequal lengths.
5. From enumerate() — Index-Based Keys
fruits = ["apple", "banana", "cherry"]
indexed = dict(enumerate(fruits))
print(indexed) # {0: 'apple', 1: 'banana', 2: 'cherry'}
# Start from 1
indexed = dict(enumerate(fruits, start=1))
print(indexed) # {1: 'apple', 2: 'banana', 3: 'cherry'}
Accessing Values
Bracket Notation — d[key]
The most direct way. Raises a KeyError if the key does not exist.
student = {"name": "Priya", "age": 22, "city": "Mumbai"}
print(student["name"]) # Priya
print(student["age"]) # 22
# KeyError if the key is missing
# print(student["phone"]) # KeyError: 'phone'
When to use: When you are certain the key exists and a missing key should be treated as a bug (crash early, fail loud).
The .get() Method — Safe Access
Returns a default value (default: None) if the key is missing. Never raises a KeyError.
student = {"name": "Priya", "age": 22, "city": "Mumbai"}
# Returns None if key is missing
print(student.get("phone")) # None
# Returns a custom default if key is missing
print(student.get("phone", "N/A")) # N/A
# Returns the value if the key exists (ignores the default)
print(student.get("name", "Unknown")) # Priya
When to use: When a missing key is a normal, expected scenario — not an error. This is the most commonly recommended access method for defensive programming.
The .setdefault() Method — Get or Initialize
Returns the value if the key exists. If the key is missing, inserts it with the given default value and returns that default.
student = {"name": "Priya", "age": 22}
# Key exists — returns existing value, dict unchanged
name = student.setdefault("name", "Unknown")
print(name) # Priya
print(student) # {'name': 'Priya', 'age': 22}
# Key missing — inserts it with the default, returns the default
city = student.setdefault("city", "Mumbai")
print(city) # Mumbai
print(student) # {'name': 'Priya', 'age': 22, 'city': 'Mumbai'}
When to use: When you want to read a key and simultaneously ensure it exists going forward. Extremely useful for building up collections:
# Group words by their first letter
words = ["apple", "avocado", "banana", "blueberry", "cherry"]
groups = {}
for word in words:
groups.setdefault(word[0], []).append(word)
print(groups)
# {'a': ['apple', 'avocado'], 'b': ['banana', 'blueberry'], 'c': ['cherry']}
Checking if a Key Exists
Use the in operator to test for membership. This is O(1) for dictionaries.
config = {"debug": True, "port": 8080, "host": "localhost"}
print("debug" in config) # True
print("timeout" in config) # False
print("debug" not in config) # False
# Check before accessing
if "timeout" in config:
timeout = config["timeout"]
else:
timeout = 30 # default
# Or simply use .get()
timeout = config.get("timeout", 30)
Adding and Updating Values
Bracket Assignment
Assign a value to a key. If the key exists, it is overwritten. If it doesn't, a new entry is created.
student = {"name": "Priya", "age": 22}
# Add a new key
student["city"] = "Mumbai"
# Update an existing key
student["age"] = 23
print(student)
# {'name': 'Priya', 'age': 23, 'city': 'Mumbai'}
The .update() Method
Merges another dict (or an iterable of key-value pairs, or keyword arguments) into the current dict. Existing keys are overwritten.
student = {"name": "Priya", "age": 22}
# Update with another dict
student.update({"age": 23, "course": "Data Science"})
print(student)
# {'name': 'Priya', 'age': 23, 'course': 'Data Science'}
# Update with keyword arguments
student.update(city="Mumbai", active=True)
print(student)
# {'name': 'Priya', 'age': 23, 'course': 'Data Science',
# 'city': 'Mumbai', 'active': True}
# Update with a list of tuples
student.update([("gpa", 3.8), ("year", 4)])
print(student["gpa"]) # 3.8
Merge Operator | (Python 3.9+)
Creates a new dictionary by merging two dicts. Keys from the right-hand side overwrite keys from the left.
defaults = {"color": "blue", "size": "medium", "font": "Arial"}
custom = {"size": "large", "theme": "dark"}
merged = defaults | custom
print(merged)
# {'color': 'blue', 'size': 'large', 'font': 'Arial', 'theme': 'dark'}
# Original dicts are NOT modified
print(defaults) # {'color': 'blue', 'size': 'medium', 'font': 'Arial'}
Augmented Merge Operator |= (Python 3.9+)
Updates the left-hand dict in place. This is equivalent to .update() but uses operator syntax.
settings = {"color": "blue", "size": "medium"}
overrides = {"size": "large", "theme": "dark"}
settings |= overrides
print(settings)
# {'color': 'blue', 'size': 'large', 'theme': 'dark'}
# Also works with any iterable of key-value pairs
settings |= [("font", "Helvetica"), ("lang", "en")]
print(settings["font"]) # Helvetica
Merging Dicts in Older Python Versions (3.5-3.8)
If you cannot use the | operator, there are other ways to merge dictionaries:
a = {"x": 1, "y": 2}
b = {"y": 3, "z": 4}
# Method 1: Unpack both dicts into a new one (Python 3.5+)
merged = {**a, **b}
print(merged) # {'x': 1, 'y': 3, 'z': 4}
# Method 2: Copy and update
merged = a.copy()
merged.update(b)
print(merged) # {'x': 1, 'y': 3, 'z': 4}
Removing Items
del Statement
Removes a key-value pair by key. Raises KeyError if the key doesn't exist.
student = {"name": "Priya", "age": 22, "city": "Mumbai", "phone": "555-1234"}
del student["phone"]
print(student) # {'name': 'Priya', 'age': 22, 'city': 'Mumbai'}
# del student["email"] # KeyError: 'email'
The .pop() Method
Removes a key and returns its value. Accepts an optional default to avoid KeyError.
student = {"name": "Priya", "age": 22, "city": "Mumbai"}
# Remove and get the value
age = student.pop("age")
print(age) # 22
print(student) # {'name': 'Priya', 'city': 'Mumbai'}
# With a default — no error if key is missing
phone = student.pop("phone", "not found")
print(phone) # not found
# Without a default — raises KeyError if missing
# student.pop("phone") # KeyError: 'phone'
The .popitem() Method
Removes and returns the last inserted key-value pair as a tuple. Useful as a stack-like operation. Raises KeyError on an empty dict.
tasks = {"task1": "Design", "task2": "Develop", "task3": "Test"}
last = tasks.popitem()
print(last) # ('task3', 'Test')
print(tasks) # {'task1': 'Design', 'task2': 'Develop'}
second = tasks.popitem()
print(second) # ('task2', 'Develop')
print(tasks) # {'task1': 'Design'}
The .clear() Method
Removes all key-value pairs, leaving an empty dictionary.
cache = {"page1": "<html>...", "page2": "<html>...", "page3": "<html>..."}
cache.clear()
print(cache) # {}
print(type(cache)) # <class 'dict'> — still a dict, just empty
Dictionary Methods Reference Table
| Method | Description | Returns |
|---|---|---|
d[key] | Get value by key | Value (raises KeyError if missing) |
d[key] = val | Set or update a key | None |
d.get(key, default) | Get value, return default if missing | Value or default (None by default) |
d.setdefault(key, default) | Get value; if missing, insert default and return it | Value |
d.keys() | View of all keys | dict_keys |
d.values() | View of all values | dict_values |
d.items() | View of all (key, value) pairs | dict_items |
d.update(other) | Merge other into d; existing keys overwritten | None |
d.pop(key, default) | Remove key, return value (or default) | Value (raises KeyError if no default and key missing) |
d.popitem() | Remove and return last inserted (key, value) pair | tuple (raises KeyError if empty) |
d.clear() | Remove all items | None |
d.copy() | Shallow copy of the dictionary | dict |
d.fromkeys(seq, val) | New dict with keys from seq, all set to val | dict |
Built-in functions that work with dictionaries:
| Function | Description | Example |
|---|---|---|
len(d) | Number of key-value pairs | len({"a": 1, "b": 2}) returns 2 |
key in d | Membership test (O(1)) | "a" in {"a": 1} returns True |
sorted(d) | Sorted list of keys | sorted({"b":2, "a":1}) returns ["a", "b"] |
min(d) / max(d) | Smallest / largest key | max({"a":1, "c":3, "b":2}) returns "c" |
dict() | Create a new dictionary | dict(a=1, b=2) returns {"a": 1, "b": 2} |
any(d) / all(d) | Test truthiness of keys | any({"", 0, "hi"}) returns True |
Note on views: The objects returned by
.keys(),.values(), and.items()are views, not lists. They reflect changes to the underlying dict in real time. If you need a static snapshot, wrap them inlist().
Iterating Over Dictionaries
Iterating Over Keys
By default, iterating over a dict yields its keys:
scores = {"Python": 95, "SQL": 88, "Tableau": 92}
# These two are equivalent
for subject in scores:
print(subject)
for subject in scores.keys():
print(subject)
# Output:
# Python
# SQL
# Tableau
Iterating Over Values
scores = {"Python": 95, "SQL": 88, "Tableau": 92}
for score in scores.values():
print(score)
# Output:
# 95
# 88
# 92
# Total and average
total = sum(scores.values())
average = total / len(scores)
print(f"Total: {total}, Average: {average:.1f}")
# Total: 275, Average: 91.7
Iterating Over Key-Value Pairs (with Unpacking)
The .items() method returns (key, value) tuples, which you can unpack directly in the for loop:
scores = {"Python": 95, "SQL": 88, "Tableau": 92}
for subject, score in scores.items():
print(f"{subject}: {score}")
# Output:
# Python: 95
# SQL: 88
# Tableau: 92
Sorted Iteration
Dictionaries maintain insertion order, but you can iterate in any sorted order:
scores = {"Python": 95, "SQL": 88, "Tableau": 92, "Excel": 78}
# Sort by key (alphabetical)
print("--- By Subject (A-Z) ---")
for subject in sorted(scores):
print(f"{subject}: {scores[subject]}")
# Sort by value (ascending)
print("\n--- By Score (low to high) ---")
for subject, score in sorted(scores.items(), key=lambda item: item[1]):
print(f"{subject}: {score}")
# Sort by value (descending)
print("\n--- By Score (high to low) ---")
for subject, score in sorted(scores.items(), key=lambda item: item[1], reverse=True):
print(f"{subject}: {score}")
Filtering During Iteration
scores = {"Python": 95, "SQL": 88, "Tableau": 92, "Excel": 78, "R": 65}
# Print only subjects with score >= 90
for subject, score in scores.items():
if score >= 90:
print(f"{subject}: {score}")
# Output:
# Python: 95
# Tableau: 92
Iterating with enumerate()
When you need a counter alongside key-value pairs:
scores = {"Python": 95, "SQL": 88, "Tableau": 92}
for rank, (subject, score) in enumerate(scores.items(), start=1):
print(f"#{rank} {subject}: {score}")
# Output:
# #1 Python: 95
# #2 SQL: 88
# #3 Tableau: 92
Important: Never modify a dict's size (add or delete keys) while iterating over it. This raises a
RuntimeError. If you need to remove items during iteration, iterate over a copy of the keys:for key in list(d.keys()):.
Dictionary Comprehensions
Dict comprehensions follow the same pattern as list comprehensions but produce dictionaries.
Basic Comprehension
# Syntax: {key_expr: value_expr for item in iterable}
# Squares
squares = {x: x ** 2 for x in range(1, 8)}
print(squares) # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49}
# Character positions
word = "hello"
positions = {char: idx for idx, char in enumerate(word)}
print(positions) # {'h': 0, 'e': 1, 'l': 3, 'o': 4}
# Note: 'l' appears twice — the second occurrence (index 3) overwrites the first (index 2)
Filtered Comprehension
scores = {"Python": 95, "SQL": 88, "Tableau": 92, "Excel": 78, "R": 65}
# Only subjects with score >= 90
high_scores = {k: v for k, v in scores.items() if v >= 90}
print(high_scores) # {'Python': 95, 'Tableau': 92}
# Only keys that start with a vowel
vowel_keys = {k: v for k, v in scores.items() if k[0].lower() in "aeiou"}
print(vowel_keys) # {'Excel': 78}
Transforming Keys or Values
prices = {"apple": 1.20, "banana": 0.50, "cherry": 2.00}
# Apply 10% discount to all prices
discounted = {fruit: round(price * 0.9, 2) for fruit, price in prices.items()}
print(discounted) # {'apple': 1.08, 'banana': 0.45, 'cherry': 1.8}
# Uppercase all keys
upper_prices = {fruit.upper(): price for fruit, price in prices.items()}
print(upper_prices) # {'APPLE': 1.2, 'BANANA': 0.5, 'CHERRY': 2.0}
# Convert string values to integers
raw = {"port": "8080", "timeout": "30", "retries": "3"}
parsed = {k: int(v) for k, v in raw.items()}
print(parsed) # {'port': 8080, 'timeout': 30, 'retries': 3}
Inverting (Swapping Keys and Values)
original = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in original.items()}
print(inverted) # {1: 'a', 2: 'b', 3: 'c'}
# Be careful: if values are not unique, later entries overwrite earlier ones
grades = {"Alice": "A", "Bob": "B", "Charlie": "A"}
inverted = {v: k for k, v in grades.items()}
print(inverted) # {'A': 'Charlie', 'B': 'Bob'} — Alice was overwritten!
# To preserve all keys, invert into lists
from collections import defaultdict
inverted_full = defaultdict(list)
for name, grade in grades.items():
inverted_full[grade].append(name)
print(dict(inverted_full)) # {'A': ['Alice', 'Charlie'], 'B': ['Bob']}
Nested Comprehension
# Create a multiplication table as a nested dict
table = {
i: {j: i * j for j in range(1, 6)}
for i in range(1, 6)
}
print(table[3][4]) # 12 (3 x 4)
print(table[5][5]) # 25 (5 x 5)
Nested Dictionaries
Dictionaries frequently contain other dictionaries, especially when representing structured data such as database records, API responses, or configuration files.
Creating Nested Dicts
students = {
"priya": {
"age": 22,
"city": "Mumbai",
"scores": {"Python": 95, "SQL": 88, "Tableau": 92}
},
"rahul": {
"age": 24,
"city": "Delhi",
"scores": {"Python": 82, "SQL": 90, "Tableau": 78}
},
"ananya": {
"age": 21,
"city": "Bangalore",
"scores": {"Python": 91, "SQL": 85, "Tableau": 95}
}
}
Accessing Deeply Nested Values
Chain bracket notation to drill into nested levels:
# Get Priya's Python score
print(students["priya"]["scores"]["Python"]) # 95
# Get Rahul's city
print(students["rahul"]["city"]) # Delhi
Safe Access for Nested Dicts
Chained bracket access is dangerous when any key in the chain might be missing. Here are several safe access strategies:
# Strategy 1: Chained .get() calls
score = students.get("priya", {}).get("scores", {}).get("Python", 0)
print(score) # 95
# For a missing student
score = students.get("unknown", {}).get("scores", {}).get("Python", 0)
print(score) # 0 (no KeyError)
# Strategy 2: try/except
try:
score = students["unknown"]["scores"]["Python"]
except KeyError:
score = 0
print(score) # 0
# Strategy 3: Write a helper function
def deep_get(d, *keys, default=None):
"""Safely navigate nested dicts."""
for key in keys:
if isinstance(d, dict):
d = d.get(key, default)
else:
return default
return d
print(deep_get(students, "priya", "scores", "Python")) # 95
print(deep_get(students, "unknown", "scores", "Python")) # None
print(deep_get(students, "priya", "address", "zip", default=0)) # 0
Modifying Nested Dicts
# Update a nested value
students["priya"]["scores"]["Python"] = 98
# Add a new nested key
students["priya"]["email"] = "priya@example.com"
# Add a new student
students["vikram"] = {
"age": 23,
"city": "Chennai",
"scores": {"Python": 76, "SQL": 80, "Tableau": 88}
}
Iterating Over Nested Dicts
# Print each student's average score
for name, info in students.items():
scores = info.get("scores", {})
if scores:
avg = sum(scores.values()) / len(scores)
print(f"{name.capitalize()}: {avg:.1f}")
# Flatten into a list of records
records = []
for name, info in students.items():
for subject, score in info.get("scores", {}).items():
records.append({"student": name, "subject": subject, "score": score})
for record in records[:3]:
print(record)
# {'student': 'priya', 'subject': 'Python', 'score': 98}
# {'student': 'priya', 'subject': 'SQL', 'score': 88}
# {'student': 'priya', 'subject': 'Tableau', 'score': 92}
Common Patterns
Counting / Frequency
Building a frequency map is one of the most common uses of dictionaries:
text = "the quick brown fox jumps over the lazy dog the fox"
words = text.split()
# Method 1: Manual counting
freq = {}
for word in words:
freq[word] = freq.get(word, 0) + 1
print(freq)
# {'the': 3, 'quick': 1, 'brown': 1, 'fox': 2, 'jumps': 1,
# 'over': 1, 'lazy': 1, 'dog': 1}
# Method 2: Using setdefault
freq2 = {}
for word in words:
freq2.setdefault(word, 0)
freq2[word] += 1
# Method 3: Using collections.Counter (recommended for production)
from collections import Counter
freq3 = Counter(words)
print(freq3.most_common(3)) # [('the', 3), ('fox', 2), ('quick', 1)]
Grouping Items
Organizing items into categories:
students = [
("Alice", "A"),
("Bob", "B"),
("Charlie", "A"),
("Diana", "C"),
("Eve", "B"),
("Frank", "A"),
]
# Method 1: Using setdefault
groups = {}
for name, grade in students:
groups.setdefault(grade, []).append(name)
print(groups)
# {'A': ['Alice', 'Charlie', 'Frank'], 'B': ['Bob', 'Eve'], 'C': ['Diana']}
# Method 2: Using defaultdict (cleaner)
from collections import defaultdict
groups = defaultdict(list)
for name, grade in students:
groups[grade].append(name)
print(dict(groups))
# {'A': ['Alice', 'Charlie', 'Frank'], 'B': ['Bob', 'Eve'], 'C': ['Diana']}
Inverting a Dictionary
Swapping keys and values (handling non-unique values safely):
# Simple inversion (only works when values are unique)
country_codes = {"IN": "India", "US": "United States", "UK": "United Kingdom"}
code_lookup = {name: code for code, name in country_codes.items()}
print(code_lookup["India"]) # IN
# Safe inversion with grouping (when values may repeat)
color_map = {"red": "warm", "orange": "warm", "blue": "cool", "green": "cool"}
inverted = defaultdict(list)
for color, category in color_map.items():
inverted[category].append(color)
print(dict(inverted)) # {'warm': ['red', 'orange'], 'cool': ['blue', 'green']}
Merging Multiple Dictionaries
# Merge an arbitrary number of dicts (last one wins for overlapping keys)
def merge_dicts(*dicts):
result = {}
for d in dicts:
result.update(d)
return result
config = merge_dicts(
{"debug": False, "port": 80, "host": "0.0.0.0"}, # base defaults
{"port": 8080, "log_level": "INFO"}, # environment config
{"debug": True}, # runtime overrides
)
print(config)
# {'debug': True, 'port': 8080, 'host': '0.0.0.0', 'log_level': 'INFO'}
# Python 3.9+ — use functools.reduce with the | operator
from functools import reduce
configs = [
{"debug": False, "port": 80},
{"port": 8080, "log_level": "INFO"},
{"debug": True},
]
merged = reduce(lambda a, b: a | b, configs)
print(merged) # {'debug': True, 'port': 8080, 'log_level': 'INFO'}
Default Values with defaultdict
defaultdict from the collections module automatically initializes missing keys with a factory function:
from collections import defaultdict
# int factory — defaults to 0
word_count = defaultdict(int)
for word in "the cat sat on the mat".split():
word_count[word] += 1
print(dict(word_count))
# {'the': 2, 'cat': 1, 'sat': 1, 'on': 1, 'mat': 1}
# list factory — defaults to empty list
index = defaultdict(list)
words = ["apple", "avocado", "banana", "blueberry", "cherry", "cranberry"]
for word in words:
index[word[0]].append(word)
print(dict(index))
# {'a': ['apple', 'avocado'], 'b': ['banana', 'blueberry'],
# 'c': ['cherry', 'cranberry']}
# set factory — defaults to empty set (useful for unique collections)
tags = defaultdict(set)
data = [("post1", "python"), ("post1", "tutorial"), ("post2", "python"), ("post1", "python")]
for post, tag in data:
tags[post].add(tag)
print(dict(tags))
# {'post1': {'python', 'tutorial'}, 'post2': {'python'}}
# Custom factory — default value of your choice
prefs = defaultdict(lambda: "not set")
prefs["theme"] = "dark"
print(prefs["theme"]) # dark
print(prefs["language"]) # not set
Ordered Operations with OrderedDict
Since Python 3.7, regular dicts preserve insertion order. OrderedDict from collections is still useful in a few cases:
from collections import OrderedDict
# OrderedDict considers order in equality comparisons
d1 = {"a": 1, "b": 2}
d2 = {"b": 2, "a": 1}
print(d1 == d2) # True — regular dicts ignore order in ==
od1 = OrderedDict(a=1, b=2)
od2 = OrderedDict(b=2, a=1)
print(od1 == od2) # False — OrderedDict considers order in ==
# move_to_end() is unique to OrderedDict
od = OrderedDict(a=1, b=2, c=3)
od.move_to_end("a") # move to the end
print(list(od.keys())) # ['b', 'c', 'a']
od.move_to_end("c", last=False) # move to the beginning
print(list(od.keys())) # ['c', 'b', 'a']
Dict vs Other Mapping Types
Python's collections module provides several specialized mapping types. Here's when to use each:
| Type | Import | Key Feature | Best For |
|---|---|---|---|
dict | Built-in | General-purpose key-value store | Almost everything |
defaultdict | collections | Auto-initializes missing keys with a factory | Counting, grouping, accumulating |
OrderedDict | collections | Order-sensitive equality, move_to_end() | Order-dependent comparisons |
Counter | collections | Specialized for counting hashable items | Frequency analysis, top-N |
ChainMap | collections | Logical merge of multiple dicts (no copy) | Layered configs, scope chains |
Counter in Action
from collections import Counter
# Count characters
c = Counter("mississippi")
print(c) # Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})
print(c.most_common(2)) # [('s', 4), ('i', 4)]
# Arithmetic on counters
a = Counter(cats=3, dogs=2)
b = Counter(cats=1, dogs=4, birds=2)
print(a + b) # Counter({'dogs': 6, 'cats': 4, 'birds': 2})
print(a - b) # Counter({'cats': 2}) — only positive counts kept
print(a & b) # Counter({'cats': 1, 'dogs': 2}) — minimum of each
print(a | b) # Counter({'dogs': 4, 'cats': 3, 'birds': 2}) — maximum of each
ChainMap in Action
ChainMap groups multiple dicts together without physically merging them. Lookups search through the chain in order.
from collections import ChainMap
defaults = {"color": "blue", "size": "medium", "debug": False}
env_config = {"size": "large", "log_level": "INFO"}
cli_args = {"debug": True}
# CLI args > env config > defaults
config = ChainMap(cli_args, env_config, defaults)
print(config["debug"]) # True (from cli_args)
print(config["size"]) # large (from env_config)
print(config["color"]) # blue (from defaults)
print(config["log_level"]) # INFO (from env_config)
# The underlying dicts are still independent
print(list(config.maps))
# [{'debug': True}, {'size': 'large', 'log_level': 'INFO'},
# {'color': 'blue', 'size': 'medium', 'debug': False}]
Performance
Time Complexity
Dictionaries are implemented as hash tables, making most operations extremely fast:
| Operation | Average Case | Worst Case | Notes |
|---|---|---|---|
d[key] (get) | O(1) | O(n) | Worst case with extreme hash collisions |
d[key] = val (set) | O(1) | O(n) | Amortized — table resizes occasionally |
del d[key] | O(1) | O(n) | Same hash table lookup |
key in d | O(1) | O(n) | Membership test |
len(d) | O(1) | O(1) | Length is stored internally |
d.keys() / d.values() / d.items() | O(1) | O(1) | Returns a view (iteration is O(n)) |
d.copy() | O(n) | O(n) | Must copy all entries |
d.update(other) | O(len(other)) | O(len(other) * n) | Inserts each key from other |
| Iteration | O(n) | O(n) | Must visit every entry |
Memory Overhead
Dictionaries use more memory than lists or tuples because of the hash table infrastructure. Each entry requires storage for the hash, the key, and the value. Since Python 3.6, the implementation uses a compact dict layout that reduces memory usage by roughly 20-25% compared to the Python 3.5 implementation.
import sys
# Compare memory usage
lst = list(range(1000))
dct = {i: i for i in range(1000)}
print(f"List: {sys.getsizeof(lst):,} bytes") # ~8,056 bytes
print(f"Dict: {sys.getsizeof(dct):,} bytes") # ~36,960 bytes
When to Use a Dict vs Other Structures
- Use a dict when you need fast lookup by key, flexible value types, or a natural key-value relationship.
- Use a list when your data is sequential and accessed by position.
- Use a set when you only need to track unique keys (no values) and want O(1) membership testing.
- Use a tuple when you need an immutable, lightweight collection (e.g., as a dict key).
- Use a named tuple or dataclass when your "dict" always has the same fields — they offer attribute access and type safety.
Practical Examples
Example 1: Student Database
# A simple student database using dictionaries
def create_database():
"""Initialize an empty student database."""
return {}
def add_student(db, student_id, name, age, courses=None):
"""Add a student to the database."""
if student_id in db:
print(f"Student {student_id} already exists. Use update instead.")
return
db[student_id] = {
"name": name,
"age": age,
"courses": courses or {},
"gpa": 0.0,
}
print(f"Added student: {name} (ID: {student_id})")
def add_course(db, student_id, course, grade):
"""Add or update a course grade for a student."""
if student_id not in db:
print(f"Student {student_id} not found.")
return
db[student_id]["courses"][course] = grade
# Recalculate GPA
courses = db[student_id]["courses"]
grade_points = {"A": 4.0, "B": 3.0, "C": 2.0, "D": 1.0, "F": 0.0}
total = sum(grade_points.get(g, 0) for g in courses.values())
db[student_id]["gpa"] = round(total / len(courses), 2)
def get_report(db, student_id):
"""Print a student report."""
student = db.get(student_id)
if not student:
print(f"Student {student_id} not found.")
return
print(f"\n--- Student Report ---")
print(f"ID: {student_id}")
print(f"Name: {student['name']}")
print(f"Age: {student['age']}")
print(f"GPA: {student['gpa']}")
print(f"Courses:")
for course, grade in student["courses"].items():
print(f" {course}: {grade}")
# Usage
db = create_database()
add_student(db, "S001", "Priya", 22)
add_student(db, "S002", "Rahul", 24)
add_course(db, "S001", "Python", "A")
add_course(db, "S001", "SQL", "B")
add_course(db, "S001", "Statistics", "A")
add_course(db, "S002", "Python", "B")
add_course(db, "S002", "SQL", "A")
get_report(db, "S001")
# --- Student Report ---
# ID: S001
# Name: Priya
# Age: 22
# GPA: 3.67
# Courses:
# Python: A
# SQL: B
# Statistics: A
Example 2: Word Frequency Counter
def word_frequency(text, top_n=10, ignore_case=True, min_length=1):
"""
Analyse word frequency in a text.
Args:
text: Input string
top_n: Number of top words to return
ignore_case: Treat 'The' and 'the' as the same word
min_length: Ignore words shorter than this
Returns:
List of (word, count) tuples sorted by frequency
"""
if ignore_case:
text = text.lower()
# Remove punctuation
import string
text = text.translate(str.maketrans("", "", string.punctuation))
# Split into words and filter by length
words = [w for w in text.split() if len(w) >= min_length]
# Count frequencies
freq = {}
for word in words:
freq[word] = freq.get(word, 0) + 1
# Sort by frequency (descending), then alphabetically
ranked = sorted(freq.items(), key=lambda x: (-x[1], x[0]))
return ranked[:top_n]
# Usage
sample_text = """
Python is a versatile programming language. Python is used for web development,
data science, machine learning, and automation. Python is known for its clean
syntax and large community. Data science with Python has become the standard
for modern analytics. Machine learning frameworks in Python include TensorFlow
and PyTorch.
"""
results = word_frequency(sample_text, top_n=8, min_length=3)
print(f"{'Word':<15} {'Count':>5}")
print("-" * 22)
for word, count in results:
print(f"{word:<15} {count:>5}")
# Word Count
# ----------------------
# python 6
# for 3
# and 3
# machine 2
# learning 2
# data 2
# science 2
# versatile 1
Example 3: Configuration Parser
def parse_config(text):
"""
Parse a simple INI-style configuration into a nested dict.
Supports:
- [section] headers
- key = value pairs
- # comments
- Blank lines (ignored)
"""
config = {}
current_section = "DEFAULT"
config[current_section] = {}
for line in text.strip().splitlines():
line = line.strip()
# Skip empty lines and comments
if not line or line.startswith("#"):
continue
# Section header
if line.startswith("[") and line.endswith("]"):
current_section = line[1:-1].strip()
config.setdefault(current_section, {})
continue
# Key-value pair
if "=" in line:
key, _, value = line.partition("=")
key = key.strip()
value = value.strip()
# Auto-convert types
if value.lower() in ("true", "yes"):
value = True
elif value.lower() in ("false", "no"):
value = False
elif value.isdigit():
value = int(value)
else:
try:
value = float(value)
except ValueError:
pass # keep as string
config[current_section][key] = value
return config
# Usage
config_text = """
# Application Configuration
[server]
host = 0.0.0.0
port = 8080
debug = true
[database]
engine = postgresql
host = localhost
port = 5432
name = myapp_db
[logging]
level = INFO
file = /var/log/app.log
rotate = yes
max_size = 10485760
"""
config = parse_config(config_text)
print(f"Server port: {config['server']['port']}") # 8080
print(f"DB engine: {config['database']['engine']}") # postgresql
print(f"Log rotate: {config['logging']['rotate']}") # True
print(f"Debug mode: {config['server']['debug']}") # True
# Access with safe defaults
timeout = config.get("server", {}).get("timeout", 30)
print(f"Timeout: {timeout}") # 30 (default)
Example 4: JSON-Like Data Processing
# Processing API-style JSON data with dictionaries
api_response = {
"status": "success",
"data": {
"users": [
{"id": 1, "name": "Priya", "role": "admin", "active": True,
"skills": ["Python", "SQL", "AWS"]},
{"id": 2, "name": "Rahul", "role": "developer", "active": True,
"skills": ["Python", "JavaScript", "Docker"]},
{"id": 3, "name": "Ananya", "role": "analyst", "active": False,
"skills": ["SQL", "Tableau", "Excel"]},
{"id": 4, "name": "Vikram", "role": "developer", "active": True,
"skills": ["Java", "Python", "Kubernetes"]},
{"id": 5, "name": "Meera", "role": "analyst", "active": True,
"skills": ["Python", "R", "SQL"]},
],
"total": 5,
"page": 1,
}
}
# Extract active users
users = api_response["data"]["users"]
active_users = [u for u in users if u["active"]]
print(f"Active users: {len(active_users)}/{len(users)}")
# Group users by role
from collections import defaultdict
by_role = defaultdict(list)
for user in users:
by_role[user["role"]].append(user["name"])
print("\nUsers by role:")
for role, names in sorted(by_role.items()):
print(f" {role}: {', '.join(names)}")
# Find all unique skills across active users
all_skills = set()
for user in active_users:
all_skills.update(user["skills"])
print(f"\nUnique skills (active users): {sorted(all_skills)}")
# Skill frequency analysis
from collections import Counter
skill_counts = Counter(
skill for user in active_users for skill in user["skills"]
)
print("\nMost sought skills:")
for skill, count in skill_counts.most_common():
bar = "#" * count
print(f" {skill:<15} {bar} ({count})")
# Transform into a lookup dict by ID
user_lookup = {u["id"]: u for u in users}
print(f"\nUser #3: {user_lookup[3]['name']} ({user_lookup[3]['role']})")
Practice Exercises
Test your understanding of dictionaries with these exercises. Try to solve each one before looking at the hints.
Exercise 1: Two Sum
Write a function that takes a list of numbers and a target sum. Return the indices of the two numbers that add up to the target. Use a dictionary for O(n) performance.
# Example:
# two_sum([2, 7, 11, 15], target=9) should return (0, 1)
# because nums[0] + nums[1] = 2 + 7 = 9
# Hint: For each number, check if (target - number) is already in your dict.
Exercise 2: Character Frequency
Write a function that takes a string and returns a dictionary mapping each character to the number of times it appears, sorted by frequency (highest first).
# Example:
# char_freq("banana") should return {'a': 3, 'n': 2, 'b': 1}
# Hint: Build the frequency dict, then use sorted() with a key function.
Exercise 3: Merge with Custom Logic
Write a function that merges two dictionaries. When both dicts have the same key, instead of overwriting, apply a custom merge function (e.g., sum the values, keep the max, or concatenate lists).
# Example:
# smart_merge({"a": 1, "b": 2}, {"b": 3, "c": 4}, strategy=sum)
# should return {"a": 1, "b": 5, "c": 4}
# Hint: Iterate through all keys from both dicts and handle conflicts.
Exercise 4: Nested Dict Flattener
Write a function that takes a nested dictionary of any depth and flattens it into a single-level dictionary, using dots as key separators.
# Example:
# flatten({"a": 1, "b": {"c": 2, "d": {"e": 3}}})
# should return {"a": 1, "b.c": 2, "b.d.e": 3}
# Hint: Use recursion. If a value is a dict, recurse with a prefix.
Exercise 5: Implement a Simple Cache (Memoization)
Write a decorator function called memoize that caches the return value of a function based on its arguments. Use a dictionary to store the cache.
# Example:
# @memoize
# def fibonacci(n):
# if n < 2:
# return n
# return fibonacci(n - 1) + fibonacci(n - 2)
#
# print(fibonacci(50)) # Should be fast, not exponentially slow
# Hint: Use a dict with the function arguments as keys.
# The functools module provides @lru_cache for production use.
Exercise 6: Build an Inverted Index
Write a function that takes a list of documents (strings) and builds an inverted index — a dictionary mapping each word to the set of document indices where it appears.
# Example:
# docs = [
# "the cat sat on the mat",
# "the dog sat on the log",
# "the cat and the dog played"
# ]
# inverted_index(docs)
# should return something like:
# {"the": {0, 1, 2}, "cat": {0, 2}, "sat": {0, 1}, "on": {0, 1},
# "mat": {0}, "dog": {1, 2}, "log": {1}, "and": {2}, "played": {2}}
# Hint: Use a defaultdict(set) and enumerate over the documents.
Summary
In this chapter, you learned:
- What dictionaries are — mutable, ordered (since 3.7), key-value mappings with O(1) average lookup
- Key constraints — keys must be hashable (immutable): strings, numbers, tuples of immutables
- Creating dicts — literal syntax
{},dict()constructor,dict.fromkeys(),zip()two lists,enumerate() - Accessing values — bracket
d[key]for strict access,.get()for safe access,.setdefault()for get-or-initialize - Adding and updating — bracket assignment,
.update(), merge|and|=(Python 3.9+),{**a, **b}unpacking - Removing items —
del d[key],.pop()with optional default,.popitem()for LIFO removal,.clear() - All major methods with a reference table
- Iteration patterns — over keys, values, items, sorted iteration, filtered iteration, with enumerate
- Dict comprehensions — basic, filtered, transforming, inverting, nested
- Nested dicts — accessing deeply, safe access with chained
.get()and helper functions, modifying and iterating - Common patterns — counting/frequency, grouping, inverting, merging multiple dicts, default values
- Specialized mappings —
defaultdict,OrderedDict,Counter,ChainMapand when to use each - Performance — O(1) average for get/set/delete/membership, memory overhead compared to lists
- Practical examples — student database, word frequency counter, config parser, JSON-like data processing
Next up: Exception Handling — learn how to handle errors gracefully with try/except, custom exceptions, and context managers.