Chapter 11 of 14

File Handling

Read, write, and manage files in Python — text files, CSV, JSON, and best practices for file operations.

Meritshot43 min read
PythonFilesIOCSVJSONFile Handling
All Python Chapters

Why File Handling Matters

Almost every real-world program needs to work with files at some point. Without file handling, data lives only in memory and disappears the moment your program ends. File handling lets your programs:

  • Persist data — Save user information, application state, or computation results so they survive after the program exits.
  • Read configurations — Load settings from config files (.ini, .json, .yaml) instead of hardcoding values.
  • Process logs — Analyze server logs, error reports, or audit trails stored in text files.
  • Exchange data — Import and export data in standard formats like CSV and JSON to communicate with other programs, databases, or APIs.
  • Generate reports — Write output files such as summaries, invoices, or analytics dashboards.
  • Automate workflows — Batch-process thousands of files, rename them, convert formats, or extract information.

Python makes file handling straightforward with built-in functions and modules. By the end of this chapter, you will be comfortable reading, writing, and managing files of all kinds.


Opening Files

The built-in open() function is the gateway to all file operations in Python. It returns a file object that you use to read from or write to the file.

Basic Syntax

file_object = open(filename, mode, encoding)
  • filename — The path to the file (string or Path object).
  • mode — How you want to open the file (read, write, append, etc.). Defaults to "r".
  • encoding — The character encoding to use. Defaults to the platform default (usually UTF-8 on modern systems, but not guaranteed).

A Simple Example

# Open a file for reading
file = open("data.txt", "r")
content = file.read()
print(content)
file.close()  # Always close when done manually

File Modes Quick Reference

ModeDescription
"r"Read (default). File must exist.
"w"Write. Creates file or overwrites existing.
"a"Append. Creates file or adds to end.
"x"Exclusive create. Error if file already exists.
"r+"Read and write. File must exist.
"w+"Write and read. Creates or overwrites.
"a+"Append and read. Creates or adds to end.
"rb"Read in binary mode.
"wb"Write in binary mode.

We will explore each mode in detail later in this chapter.

The encoding Parameter

Always specify encoding explicitly when working with text files. This avoids surprises across different operating systems:

# Recommended: specify encoding
file = open("data.txt", "r", encoding="utf-8")
content = file.read()
file.close()
# Without encoding, Python uses the platform default
# On Windows this might be 'cp1252', on Linux/Mac 'utf-8'
file = open("data.txt", "r")  # encoding depends on OS

Common encodings you may encounter:

EncodingDescription
"utf-8"Universal standard. Handles all languages.
"ascii"English-only, 128 characters.
"latin-1"Western European languages. Never raises decode errors.
"utf-16"Used by some Windows applications.
"cp1252"Windows default for Western European locales.

Tip: When in doubt, use encoding="utf-8". It covers the vast majority of use cases.


The with Statement (Context Manager)

The with statement is the recommended way to work with files in Python. It guarantees the file will be properly closed, even if an error occurs inside the block.

Why with Is Preferred

Without with, you must remember to close the file manually. If an exception occurs before file.close(), the file stays open, which can lead to data loss or resource leaks.

Without with (risky):

file = open("data.txt", "r")
content = file.read()
# If an error occurs here, file.close() never runs!
file.close()

With with (safe):

with open("data.txt", "r", encoding="utf-8") as file:
    content = file.read()
    print(content)
# File is automatically closed here, even if an error occurred

What Happens Without with

Consider this scenario where an error prevents the file from closing:

# DANGEROUS: file may remain open
file = open("data.txt", "r")
content = file.read()
result = int(content)  # ValueError if content isn't a number!
file.close()           # This line never executes if the error above fires

To handle this properly without with, you would need a try/finally block:

# Safe but verbose
file = open("data.txt", "r")
try:
    content = file.read()
    result = int(content)
finally:
    file.close()  # Runs no matter what

The with statement does exactly the same thing, but more cleanly:

# Clean and safe
with open("data.txt", "r") as file:
    content = file.read()
    result = int(content)
# file.close() is called automatically

Opening Multiple Files

You can open multiple files in a single with statement:

with open("input.txt", "r", encoding="utf-8") as infile, \
     open("output.txt", "w", encoding="utf-8") as outfile:
    for line in infile:
        outfile.write(line.upper())

Starting with Python 3.10, you can use parentheses for a cleaner look:

with (
    open("input.txt", "r", encoding="utf-8") as infile,
    open("output.txt", "w", encoding="utf-8") as outfile,
):
    for line in infile:
        outfile.write(line.upper())

Checking If a File Is Closed

with open("data.txt", "r") as f:
    print(f.closed)  # False — file is open inside the block

print(f.closed)      # True — file is closed outside the block

Best Practice: Always use with for file operations. There is almost never a reason to use manual open() / close().


Reading Files

Python offers several ways to read file contents, each suited to different situations.

Assume we have a file called notes.txt with this content:

Hello, World!
Python is great.
File handling is useful.
This is the last line.

read() — Read the Entire File

The read() method returns the complete file content as a single string:

with open("notes.txt", "r", encoding="utf-8") as f:
    content = f.read()

print(content)
# Output:
# Hello, World!
# Python is great.
# File handling is useful.
# This is the last line.

print(type(content))  # <class 'str'>
print(len(content))   # Total number of characters in the file

Warning: read() loads the entire file into memory. For very large files (hundreds of MB or more), this can crash your program. Use line-by-line iteration for large files.

read(n) — Read N Characters

You can pass an integer to read() to read only that many characters:

with open("notes.txt", "r", encoding="utf-8") as f:
    first_five = f.read(5)
    print(first_five)  # Hello

    next_eight = f.read(8)
    print(next_eight)  # , World!

Notice that the second read(8) picks up where the first one left off. The file has an internal cursor (more on this later).

readline() — Read One Line

The readline() method reads a single line from the file, including the trailing newline character (\n):

with open("notes.txt", "r", encoding="utf-8") as f:
    line1 = f.readline()
    print(line1)           # 'Hello, World!\n'
    print(repr(line1))     # 'Hello, World!\n'

    line2 = f.readline()
    print(line2)           # 'Python is great.\n'

You can use readline() in a loop:

with open("notes.txt", "r", encoding="utf-8") as f:
    while True:
        line = f.readline()
        if not line:       # Empty string means end of file
            break
        print(line.strip())

# Output:
# Hello, World!
# Python is great.
# File handling is useful.
# This is the last line.

readlines() — Read All Lines into a List

The readlines() method returns a list where each element is a line from the file:

with open("notes.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()

print(lines)
# ['Hello, World!\n', 'Python is great.\n', 'File handling is useful.\n', 'This is the last line.\n']

print(len(lines))  # 4

Iterating Line by Line (Most Memory-Efficient)

The best way to process a file line by line is to iterate directly over the file object. Python reads one line at a time, keeping memory usage low even for enormous files:

with open("notes.txt", "r", encoding="utf-8") as f:
    for line in f:
        print(line.strip())

# Output:
# Hello, World!
# Python is great.
# File handling is useful.
# This is the last line.

This approach is strongly recommended for large files because it never loads the entire file into memory.

Stripping Newlines

Every method that reads lines includes the trailing \n. Use strip() to remove it:

with open("notes.txt", "r", encoding="utf-8") as f:
    for line in f:
        clean_line = line.strip()   # Removes leading/trailing whitespace including \n
        print(clean_line)

You can also use rstrip("\n") if you want to remove only the trailing newline but keep other whitespace:

with open("notes.txt", "r", encoding="utf-8") as f:
    for line in f:
        clean_line = line.rstrip("\n")
        print(clean_line)

Reading into a List Without Newlines

A common pattern to get a clean list of lines:

with open("notes.txt", "r", encoding="utf-8") as f:
    lines = [line.strip() for line in f]

print(lines)
# ['Hello, World!', 'Python is great.', 'File handling is useful.', 'This is the last line.']

Reading in Chunks

For processing very large files efficiently, you can read fixed-size chunks:

def read_in_chunks(filepath, chunk_size=1024):
    """Read a file in fixed-size chunks."""
    with open(filepath, "r", encoding="utf-8") as f:
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            # Process each chunk
            print(f"Read {len(chunk)} characters")

read_in_chunks("large_file.txt")

Comparison of Reading Methods

MethodReturnsMemory UsageBest For
read()Entire stringHigh (whole file)Small files
read(n)N charactersLowChunk processing
readline()One line stringLowReading one line at a time
readlines()List of stringsHigh (whole file)When you need all lines in a list
for line in fOne line per loopLowLarge files (recommended)

Writing Files

write() — Write a String

The write() method writes a string to the file. It does not add a newline automatically — you must include \n yourself:

with open("output.txt", "w", encoding="utf-8") as f:
    f.write("Hello, World!\n")
    f.write("This is line 2.\n")
    f.write("This is line 3.\n")

The resulting output.txt:

Hello, World!
This is line 2.
This is line 3.

write() returns the number of characters written:

with open("output.txt", "w", encoding="utf-8") as f:
    chars_written = f.write("Hello!\n")
    print(chars_written)  # 7 (6 characters + 1 newline)

writelines() — Write Multiple Strings

The writelines() method takes an iterable of strings and writes them all. Like write(), it does not add newlines between items:

lines = ["Line 1\n", "Line 2\n", "Line 3\n"]

with open("output.txt", "w", encoding="utf-8") as f:
    f.writelines(lines)

If your list does not already contain newlines, add them:

items = ["apple", "banana", "cherry"]

with open("fruits.txt", "w", encoding="utf-8") as f:
    f.writelines(item + "\n" for item in items)

Overwriting vs. Appending

Overwriting ("w" mode) — Erases all existing content and starts fresh:

# First write — creates the file
with open("log.txt", "w", encoding="utf-8") as f:
    f.write("Log started.\n")

# Second write — ERASES everything and writes new content
with open("log.txt", "w", encoding="utf-8") as f:
    f.write("Fresh start.\n")

# log.txt now contains ONLY: "Fresh start.\n"

Appending ("a" mode) — Adds to the end of the file without erasing:

# First write — creates the file
with open("log.txt", "w", encoding="utf-8") as f:
    f.write("Log started.\n")

# Append — adds to the end
with open("log.txt", "a", encoding="utf-8") as f:
    f.write("New entry added.\n")

# log.txt now contains:
# Log started.
# New entry added.

Creating New Files Safely

Use "x" mode to create a file only if it does not already exist. This prevents accidentally overwriting important data:

try:
    with open("important_data.txt", "x", encoding="utf-8") as f:
        f.write("This file was just created.\n")
    print("File created successfully.")
except FileExistsError:
    print("File already exists! Not overwriting.")

Writing with print()

You can redirect print() output to a file using the file parameter:

with open("output.txt", "w", encoding="utf-8") as f:
    print("Hello, World!", file=f)
    print("This is line 2.", file=f)
    print("Value:", 42, file=f)

This is convenient because print() automatically adds newlines and can handle multiple arguments with spaces.


File Modes Deep Dive

Understanding file modes is essential. Here is a comprehensive reference:

Text Modes

ModeReadWriteCreates FileTruncates (Erases)Cursor PositionFile Must Exist
"r"YesNoNoNoBeginningYes
"w"NoYesYesYesBeginningNo
"a"NoYesYesNoEndNo
"x"NoYesYesN/A (new file)BeginningMust NOT exist
"r+"YesYesNoNoBeginningYes
"w+"YesYesYesYesBeginningNo
"a+"YesYesYesNoEndNo

Binary Modes

Add "b" to any mode for binary file operations (images, PDFs, executables, etc.):

ModeDescription
"rb"Read binary
"wb"Write binary (creates or truncates)
"ab"Append binary
"xb"Exclusive create binary
"rb+"Read and write binary
"wb+"Write and read binary (truncates)
"ab+"Append and read binary

Text vs. Binary Mode

AspectText Mode ("r", "w")Binary Mode ("rb", "wb")
Data typestrbytes
Newline handlingConverts \r\n to \n on readNo conversion
EncodingUses specified encodingNo encoding
Use for.txt, .csv, .json, .htmlImages, audio, video, .pdf

The + Modes Explained

The + sign adds the complementary capability to a mode:

# r+ : Read AND write. File must exist. Cursor starts at beginning.
with open("data.txt", "r+", encoding="utf-8") as f:
    content = f.read()    # Read existing content
    f.write("Appended!")  # Write at current cursor position (end after read)

# w+ : Write AND read. Creates or truncates file.
with open("data.txt", "w+", encoding="utf-8") as f:
    f.write("New content")
    f.seek(0)              # Move cursor back to beginning
    content = f.read()     # Now we can read what we wrote
    print(content)         # New content

# a+ : Append AND read. Creates file if needed. Write cursor always at end.
with open("data.txt", "a+", encoding="utf-8") as f:
    f.write("More data\n")
    f.seek(0)              # Move cursor to beginning for reading
    content = f.read()     # Read entire file
    print(content)

File Pointer / Cursor

Every open file has an internal cursor (also called the file pointer) that tracks your current position in the file. Reading or writing advances the cursor forward.

tell() — Get Current Position

The tell() method returns the current position of the cursor in bytes:

with open("notes.txt", "r", encoding="utf-8") as f:
    print(f.tell())       # 0 — cursor at the very start

    f.read(5)
    print(f.tell())       # 5 — moved forward 5 bytes

    f.readline()
    print(f.tell())       # Position after the first line ends

seek() — Move the Cursor

The seek(offset, whence) method moves the cursor:

  • offset — Number of bytes to move.
  • whence — Reference point (optional):
    • 0 — From the beginning (default)
    • 1 — From the current position (binary mode only)
    • 2 — From the end (binary mode only)
with open("notes.txt", "r", encoding="utf-8") as f:
    content = f.read()
    print(f.tell())     # Cursor is at the end

    f.seek(0)           # Move back to the beginning
    print(f.tell())     # 0

    first_line = f.readline()
    print(first_line.strip())  # Hello, World!

Practical Example: Re-reading a File

with open("data.txt", "r", encoding="utf-8") as f:
    # First pass: count lines
    line_count = sum(1 for _ in f)
    print(f"Total lines: {line_count}")

    # Reset cursor to re-read
    f.seek(0)

    # Second pass: process content
    for line in f:
        print(line.strip().upper())

Practical Example: Overwriting Part of a File

# Create a sample file
with open("record.txt", "w", encoding="utf-8") as f:
    f.write("Name: Alice\n")
    f.write("Score: 085\n")

# Overwrite part of it using seek
with open("record.txt", "r+", encoding="utf-8") as f:
    content = f.read()
    print("Before:", repr(content))

    f.seek(0)           # Go back to start
    # Overwrite with same-length content
    f.write("Name: Bobby\n")
    f.write("Score: 099\n")

    f.seek(0)
    print("After:", f.read())

# Output:
# Before: 'Name: Alice\nScore: 085\n'
# After: Name: Bobby
#        Score: 099

Using seek() in Binary Mode

In binary mode, you can seek relative to the current position or the end of the file:

with open("data.bin", "rb") as f:
    f.seek(0, 2)       # Move to the end of the file
    file_size = f.tell()
    print(f"File size: {file_size} bytes")

    f.seek(-10, 2)     # Move to 10 bytes before the end
    last_10 = f.read()
    print(f"Last 10 bytes: {last_10}")

Working with CSV Files

CSV (Comma-Separated Values) is one of the most common data exchange formats. Python's csv module handles the tricky parts like quoting, escaping, and different delimiters.

Why Use the csv Module?

You might think you can just split lines by commas:

# Naive approach — BREAKS with commas inside quoted fields!
line = 'John,"New York, NY",30'
fields = line.split(",")
print(fields)  # ['John', '"New York', ' NY"', '30']  — WRONG!

The csv module handles this correctly:

import csv
import io

line = 'John,"New York, NY",30'
reader = csv.reader(io.StringIO(line))
for row in reader:
    print(row)  # ['John', 'New York, NY', '30']  — CORRECT!

csv.reader — Reading CSV Files

Assume students.csv contains:

Name,Age,Score
Priya,22,95
Rahul,24,82
Ananya,23,90
import csv

with open("students.csv", "r", encoding="utf-8") as f:
    reader = csv.reader(f)
    header = next(reader)  # Read the header row
    print("Columns:", header)

    for row in reader:
        name, age, score = row
        print(f"{name} is {age} years old and scored {score}")

# Output:
# Columns: ['Name', 'Age', 'Score']
# Priya is 22 years old and scored 95
# Rahul is 24 years old and scored 82
# Ananya is 23 years old and scored 90

csv.DictReader — Reading CSV as Dictionaries

DictReader uses the first row as keys, giving you a dictionary for each row:

import csv

with open("students.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)

    for row in reader:
        print(f"{row['Name']}: {row['Score']}")
        # Each row is an OrderedDict: {'Name': 'Priya', 'Age': '22', 'Score': '95'}

# Output:
# Priya: 95
# Rahul: 82
# Ananya: 90

csv.writer — Writing CSV Files

import csv

data = [
    ["Name", "Age", "Score"],
    ["Priya", 22, 95],
    ["Rahul", 24, 82],
    ["Ananya", 23, 90],
]

with open("students.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerows(data)  # Write all rows at once

You can also write one row at a time:

import csv

with open("students.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["Name", "Age", "Score"])  # Header
    writer.writerow(["Priya", 22, 95])
    writer.writerow(["Rahul", 24, 82])

Important: Always pass newline="" when opening CSV files. This prevents blank lines between rows on Windows.

csv.DictWriter — Writing CSV from Dictionaries

import csv

students = [
    {"Name": "Priya", "Age": 22, "Score": 95},
    {"Name": "Rahul", "Age": 24, "Score": 82},
    {"Name": "Ananya", "Age": 23, "Score": 90},
]

with open("students.csv", "w", newline="", encoding="utf-8") as f:
    fieldnames = ["Name", "Age", "Score"]
    writer = csv.DictWriter(f, fieldnames=fieldnames)

    writer.writeheader()       # Writes the header row
    writer.writerows(students) # Writes all data rows

Custom Delimiters

Not all "CSV" files use commas. Some use tabs, semicolons, or pipes:

import csv

# Reading a tab-separated file
with open("data.tsv", "r", encoding="utf-8") as f:
    reader = csv.reader(f, delimiter="\t")
    for row in reader:
        print(row)

# Writing a semicolon-separated file
with open("data_semicolon.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f, delimiter=";")
    writer.writerow(["Name", "City", "Score"])
    writer.writerow(["Priya", "Mumbai", 95])

Quoting Options

The csv module provides different quoting strategies:

import csv

data = [["Name", "Comment"], ["Alice", 'She said "hello"'], ["Bob", "No comment"]]

# QUOTE_MINIMAL (default) — only quote fields that need it
with open("out.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
    writer.writerows(data)

# QUOTE_ALL — quote every field
with open("out.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f, quoting=csv.QUOTE_ALL)
    writer.writerows(data)

# QUOTE_NONNUMERIC — quote all non-numeric fields
with open("out.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)
    writer.writerows(data)
Quoting ConstantBehavior
csv.QUOTE_MINIMALQuote only fields that contain special chars
csv.QUOTE_ALLQuote every field
csv.QUOTE_NONNUMERICQuote all non-numeric fields
csv.QUOTE_NONENever quote (use escape character instead)

Working with JSON Files

JSON (JavaScript Object Notation) is the dominant data format for web APIs, configuration files, and data exchange. Python's json module provides seamless conversion between Python objects and JSON.

Python to JSON Type Mapping

PythonJSON
dictobject {}
list, tuplearray []
strstring ""
int, floatnumber
True / Falsetrue / false
Nonenull

json.load() — Read JSON from a File

Assume config.json contains:

{
  "database": {
    "host": "localhost",
    "port": 5432,
    "name": "myapp"
  },
  "debug": true,
  "allowed_hosts": ["localhost", "127.0.0.1"]
}
import json

with open("config.json", "r", encoding="utf-8") as f:
    config = json.load(f)

print(config["database"]["host"])   # localhost
print(config["database"]["port"])   # 5432
print(config["debug"])              # True
print(config["allowed_hosts"])      # ['localhost', '127.0.0.1']
print(type(config))                 # <class 'dict'>

json.dump() — Write JSON to a File

import json

data = {
    "name": "Meritshot",
    "courses": ["Python", "SQL", "Power BI"],
    "students": 5000,
    "active": True
}

with open("data.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2)

The resulting data.json:

{
  "name": "Meritshot",
  "courses": [
    "Python",
    "SQL",
    "Power BI"
  ],
  "students": 5000,
  "active": true
}

The indent Parameter

The indent parameter controls pretty-printing:

import json

data = {"name": "Alice", "scores": [90, 85, 92]}

# No indent — compact, single line (good for saving space)
print(json.dumps(data))
# {"name": "Alice", "scores": [90, 85, 92]}

# indent=2 — readable, indented by 2 spaces
print(json.dumps(data, indent=2))
# {
#   "name": "Alice",
#   "scores": [
#     90,
#     85,
#     92
#   ]
# }

# indent=4 — more indentation
print(json.dumps(data, indent=4))

json.loads() and json.dumps() — String Conversion

These work with strings instead of files:

import json

# Python dict to JSON string
python_dict = {"name": "Priya", "age": 22, "enrolled": True}
json_string = json.dumps(python_dict)
print(json_string)        # '{"name": "Priya", "age": 22, "enrolled": true}'
print(type(json_string))  # <class 'str'>

# JSON string to Python dict
json_text = '{"name": "Rahul", "age": 24, "enrolled": false}'
python_obj = json.loads(json_text)
print(python_obj)          # {'name': 'Rahul', 'age': 24, 'enrolled': False}
print(type(python_obj))    # <class 'dict'>
print(python_obj["name"])  # Rahul

Handling Nested Data

JSON often contains deeply nested structures:

import json

api_response = {
    "status": "success",
    "data": {
        "users": [
            {
                "id": 1,
                "name": "Alice",
                "address": {
                    "city": "Mumbai",
                    "pincode": "400001"
                }
            },
            {
                "id": 2,
                "name": "Bob",
                "address": {
                    "city": "Delhi",
                    "pincode": "110001"
                }
            }
        ],
        "total": 2
    }
}

# Write nested JSON
with open("api_data.json", "w", encoding="utf-8") as f:
    json.dump(api_response, f, indent=2)

# Read and navigate nested JSON
with open("api_data.json", "r", encoding="utf-8") as f:
    data = json.load(f)

for user in data["data"]["users"]:
    print(f"{user['name']} lives in {user['address']['city']}")

# Output:
# Alice lives in Mumbai
# Bob lives in Delhi

Additional json.dump() / json.dumps() Parameters

import json

data = {"name": "Priya", "city": "Mumbai", "age": 22}

# sort_keys — Sort dictionary keys alphabetically
print(json.dumps(data, sort_keys=True, indent=2))
# {
#   "age": 22,
#   "city": "Mumbai",
#   "name": "Priya"
# }

# ensure_ascii=False — Preserve non-ASCII characters (Hindi, Chinese, etc.)
data_hindi = {"name": "प्रिया", "city": "मुंबई"}
print(json.dumps(data_hindi, ensure_ascii=False, indent=2))
# {
#   "name": "प्रिया",
#   "city": "मुंबई"
# }

# separators — Customize separators for compact output
print(json.dumps(data, separators=(",", ":")))
# {"name":"Priya","city":"Mumbai","age":22}

Custom Serialization with default

Some Python objects (like datetime) are not JSON-serializable by default:

import json
from datetime import datetime, date

data = {
    "event": "Enrollment",
    "date": datetime(2026, 3, 15, 10, 30),
    "today": date(2026, 3, 15)
}

# This will raise TypeError:
# json.dumps(data)  # TypeError: Object of type datetime is not JSON serializable

# Solution: provide a custom serializer
def custom_serializer(obj):
    if isinstance(obj, (datetime, date)):
        return obj.isoformat()
    raise TypeError(f"Type {type(obj)} not serializable")

json_string = json.dumps(data, default=custom_serializer, indent=2)
print(json_string)
# {
#   "event": "Enrollment",
#   "date": "2026-03-15T10:30:00",
#   "today": "2026-03-15"
# }

Working with Binary Files

Binary files store data as raw bytes rather than text. This includes images, audio, video, PDFs, executables, and compressed archives. Use "b" modes ("rb", "wb", "ab") for these files.

Reading a Binary File

with open("photo.jpg", "rb") as f:
    data = f.read()

print(type(data))       # <class 'bytes'>
print(len(data))        # File size in bytes
print(data[:10])        # First 10 bytes (e.g., b'\xff\xd8\xff\xe0...')

Writing a Binary File

# Write raw bytes
with open("output.bin", "wb") as f:
    f.write(b"\x00\x01\x02\x03\x04")
    f.write(bytes([72, 101, 108, 108, 111]))  # "Hello" in ASCII bytes

Copying a File Byte by Byte

A practical example of copying any type of file:

def copy_file(source, destination, chunk_size=4096):
    """Copy a file in chunks (works for any file type)."""
    bytes_copied = 0
    with open(source, "rb") as src, open(destination, "wb") as dst:
        while True:
            chunk = src.read(chunk_size)
            if not chunk:
                break
            dst.write(chunk)
            bytes_copied += len(chunk)

    print(f"Copied {bytes_copied} bytes from {source} to {destination}")

# Usage
copy_file("photo.jpg", "photo_backup.jpg")

Checking File Type by Magic Bytes

Many file types start with specific "magic bytes" that identify them:

def identify_file_type(filepath):
    """Identify file type by reading its first few bytes."""
    signatures = {
        b"\xff\xd8\xff": "JPEG image",
        b"\x89PNG": "PNG image",
        b"GIF87a": "GIF image",
        b"GIF89a": "GIF image",
        b"%PDF": "PDF document",
        b"PK": "ZIP archive (or .docx, .xlsx)",
    }

    with open(filepath, "rb") as f:
        header = f.read(8)

    for magic, filetype in signatures.items():
        if header.startswith(magic):
            return filetype

    return "Unknown"

# Usage
# print(identify_file_type("photo.jpg"))    # JPEG image
# print(identify_file_type("report.pdf"))   # PDF document

File and Directory Operations with os and pathlib

Python provides two main approaches for working with the file system: the older os / os.path module and the modern pathlib module (introduced in Python 3.4).

Checking File Existence

import os

# Using os.path
if os.path.exists("data.txt"):
    print("File exists")
else:
    print("File not found")

# Check specifically for file vs directory
print(os.path.isfile("data.txt"))      # True if it is a file
print(os.path.isdir("data"))           # True if it is a directory

Getting File Information

import os

# File size in bytes
size = os.path.getsize("data.txt")
print(f"Size: {size} bytes")

# Absolute path
abs_path = os.path.abspath("data.txt")
print(f"Absolute path: {abs_path}")

# File name and directory
print(os.path.basename("/home/user/data.txt"))  # data.txt
print(os.path.dirname("/home/user/data.txt"))   # /home/user

# Split path and extension
name, ext = os.path.splitext("report.csv")
print(name, ext)  # report .csv

Listing Directory Contents

import os

# List all items in a directory
items = os.listdir(".")
print(items)  # ['file1.txt', 'folder1', 'script.py', ...]

# List only files
files = [f for f in os.listdir(".") if os.path.isfile(f)]
print(files)

# List only directories
dirs = [d for d in os.listdir(".") if os.path.isdir(d)]
print(dirs)

# List files with a specific extension
csv_files = [f for f in os.listdir(".") if f.endswith(".csv")]
print(csv_files)

Creating and Removing Directories

import os

# Create a single directory
os.mkdir("new_folder")

# Create nested directories (like mkdir -p)
os.makedirs("path/to/nested/folder", exist_ok=True)
# exist_ok=True prevents error if directory already exists

# Remove an empty directory
os.rmdir("new_folder")

# Remove nested empty directories
os.removedirs("path/to/nested/folder")

Renaming and Moving Files

import os

# Rename a file
os.rename("old_name.txt", "new_name.txt")

# Move a file to a different directory
os.rename("file.txt", "archive/file.txt")

# For more robust moving, use shutil
import shutil

shutil.move("source.txt", "destination/source.txt")

# Copy a file
shutil.copy2("original.txt", "backup.txt")      # Preserves metadata
shutil.copytree("source_dir", "backup_dir")      # Copy entire directory

# Remove a non-empty directory
shutil.rmtree("old_directory")

Joining Paths Safely

Never concatenate paths with string concatenation. Use os.path.join():

import os

# Correct — works on any OS
path = os.path.join("data", "output", "report.csv")
print(path)  # data/output/report.csv (or data\output\report.csv on Windows)

# Wrong — breaks on Windows
path = "data" + "/" + "output" + "/" + "report.csv"

Walking Directory Trees

os.walk() recursively traverses an entire directory tree:

import os

for dirpath, dirnames, filenames in os.walk("project"):
    # dirpath  — current directory path
    # dirnames — list of subdirectories in dirpath
    # filenames — list of files in dirpath

    print(f"\nDirectory: {dirpath}")
    print(f"  Subdirectories: {dirnames}")
    print(f"  Files: {filenames}")

# Example output:
# Directory: project
#   Subdirectories: ['src', 'data']
#   Files: ['README.md', 'setup.py']
#
# Directory: project/src
#   Subdirectories: []
#   Files: ['main.py', 'utils.py']
#
# Directory: project/data
#   Subdirectories: []
#   Files: ['input.csv']

A practical example: find all Python files in a project:

import os

python_files = []
for dirpath, dirnames, filenames in os.walk("project"):
    for filename in filenames:
        if filename.endswith(".py"):
            full_path = os.path.join(dirpath, filename)
            python_files.append(full_path)

print(f"Found {len(python_files)} Python files:")
for f in python_files:
    print(f"  {f}")

os.path vs pathlib.Path Comparison

Taskos.pathpathlib.Path
Join pathsos.path.join("a", "b")Path("a") / "b"
Get file nameos.path.basename(p)Path(p).name
Get directoryos.path.dirname(p)Path(p).parent
Get extensionos.path.splitext(p)[1]Path(p).suffix
Check existenceos.path.exists(p)Path(p).exists()
Is file?os.path.isfile(p)Path(p).is_file()
Is directory?os.path.isdir(p)Path(p).is_dir()
Absolute pathos.path.abspath(p)Path(p).resolve()
List directoryos.listdir(p)Path(p).iterdir()
Glob filesglob.glob("*.py")Path(".").glob("*.py")
Read fileopen(p).read()Path(p).read_text()
Write fileopen(p, "w").write(s)Path(p).write_text(s)
Create directoryos.makedirs(p)Path(p).mkdir(parents=True)

The pathlib Module (Modern Approach)

pathlib was introduced in Python 3.4 and provides an object-oriented interface for filesystem paths. It is now the recommended way to handle paths in new Python code.

Creating Path Objects

from pathlib import Path

# From a string
p = Path("data/output/report.csv")

# Current directory
cwd = Path.cwd()
print(cwd)  # /home/user/project

# Home directory
home = Path.home()
print(home)  # /home/user

# Joining paths with the / operator
data_dir = Path("data")
output_file = data_dir / "output" / "report.csv"
print(output_file)  # data/output/report.csv

Path Properties

from pathlib import Path

p = Path("data/output/report.csv")

print(p.name)       # report.csv     — file name with extension
print(p.stem)       # report         — file name without extension
print(p.suffix)     # .csv           — file extension
print(p.suffixes)   # ['.csv']       — all extensions (e.g., ['.tar', '.gz'])
print(p.parent)     # data/output    — parent directory
print(p.parents[0]) # data/output    — immediate parent
print(p.parents[1]) # data           — grandparent
print(p.parts)      # ('data', 'output', 'report.csv')

Changing Parts of a Path

from pathlib import Path

p = Path("data/report.csv")

# Change extension
new_p = p.with_suffix(".json")
print(new_p)  # data/report.json

# Change file name
new_p = p.with_name("summary.csv")
print(new_p)  # data/summary.csv

# Change stem (keep extension)
new_p = p.with_stem("final_report")
print(new_p)  # data/final_report.csv

Checking Properties

from pathlib import Path

p = Path("data.txt")

print(p.exists())       # True if path exists
print(p.is_file())      # True if it is a regular file
print(p.is_dir())       # True if it is a directory
print(p.is_absolute())  # True if path is absolute (/home/user/...)

Reading and Writing with Path

pathlib provides convenience methods for quick file I/O:

from pathlib import Path

# Write text to a file (creates or overwrites)
Path("greeting.txt").write_text("Hello, World!\n", encoding="utf-8")

# Read text from a file
content = Path("greeting.txt").read_text(encoding="utf-8")
print(content)  # Hello, World!

# Write bytes
Path("data.bin").write_bytes(b"\x00\x01\x02\x03")

# Read bytes
data = Path("data.bin").read_bytes()
print(data)  # b'\x00\x01\x02\x03'

Note: read_text() and write_text() handle opening and closing the file for you, but they read/write the entire content at once. For line-by-line processing, use open() with with.

Glob Patterns

glob() finds files matching a pattern. rglob() searches recursively:

from pathlib import Path

project = Path("project")

# All .py files in the directory (not subdirectories)
for py_file in project.glob("*.py"):
    print(py_file)

# All .py files recursively (including subdirectories)
for py_file in project.rglob("*.py"):
    print(py_file)

# All CSV files in any 'data' subdirectory
for csv_file in project.rglob("data/*.csv"):
    print(csv_file)

# All image files
for img in project.rglob("*.jpg"):
    print(img)
for img in project.rglob("*.png"):
    print(img)

Creating Directories

from pathlib import Path

# Create a single directory
Path("new_folder").mkdir(exist_ok=True)

# Create nested directories
Path("path/to/nested/folder").mkdir(parents=True, exist_ok=True)

Iterating Over Directory Contents

from pathlib import Path

data_dir = Path("data")

# List all items
for item in data_dir.iterdir():
    if item.is_file():
        print(f"File: {item.name} ({item.stat().st_size} bytes)")
    elif item.is_dir():
        print(f"Dir:  {item.name}")

Getting File Metadata

from pathlib import Path
from datetime import datetime

p = Path("data.txt")
stat = p.stat()

print(f"Size: {stat.st_size} bytes")
print(f"Created: {datetime.fromtimestamp(stat.st_ctime)}")
print(f"Modified: {datetime.fromtimestamp(stat.st_mtime)}")
print(f"Accessed: {datetime.fromtimestamp(stat.st_atime)}")

Deleting Files

from pathlib import Path

# Delete a file
Path("temp.txt").unlink(missing_ok=True)  # missing_ok prevents error if absent

# Delete an empty directory
Path("empty_folder").rmdir()

Temporary Files

The tempfile module creates temporary files and directories that are automatically cleaned up when no longer needed. This is useful for storing intermediate data during processing.

NamedTemporaryFile

Creates a temporary file with a name you can reference:

import tempfile

# Create a temporary file
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False,
                                  encoding="utf-8") as tmp:
    tmp.write("Temporary data\n")
    tmp.write("More temporary data\n")
    print(f"Temp file: {tmp.name}")  # e.g., /tmp/tmpxyz123.txt

# The file still exists because delete=False
# Read it back
with open(tmp.name, "r", encoding="utf-8") as f:
    print(f.read())

# Clean up manually
import os
os.unlink(tmp.name)

With delete=True (the default), the file is deleted when the with block ends:

import tempfile

with tempfile.NamedTemporaryFile(mode="w", suffix=".csv",
                                  encoding="utf-8") as tmp:
    tmp.write("Name,Score\n")
    tmp.write("Alice,95\n")
    tmp_path = tmp.name  # Save the path for later reference
    print(f"Temp file: {tmp_path}")
    # File exists here

# File is automatically deleted here

TemporaryDirectory

Creates a temporary directory that is automatically removed when the with block ends:

import tempfile
from pathlib import Path

with tempfile.TemporaryDirectory() as tmpdir:
    print(f"Temp dir: {tmpdir}")  # e.g., /tmp/tmpxyz456

    # Create files inside
    data_file = Path(tmpdir) / "data.txt"
    data_file.write_text("Hello from temp dir!", encoding="utf-8")

    results_file = Path(tmpdir) / "results.csv"
    results_file.write_text("Name,Score\nAlice,95\n", encoding="utf-8")

    # Use the files
    print(data_file.read_text(encoding="utf-8"))

# Entire directory and all contents are automatically deleted here

mkstemp and mkdtemp (Low-Level)

For more control, you can use the lower-level functions:

import tempfile
import os

# Create a temporary file (returns file descriptor and path)
fd, path = tempfile.mkstemp(suffix=".txt")
try:
    with os.fdopen(fd, "w", encoding="utf-8") as f:
        f.write("Low-level temp file\n")
    # Read it back
    with open(path, "r", encoding="utf-8") as f:
        print(f.read())
finally:
    os.unlink(path)  # Manual cleanup required

Practical Examples

Example 1: Log File Analyzer

A program that reads a server log file, counts error types, and writes a summary report:

from collections import Counter
from datetime import datetime

def analyze_log(input_path, output_path):
    """Analyze a log file and generate a summary report."""
    error_counts = Counter()
    warning_counts = Counter()
    total_lines = 0
    error_lines = 0
    warning_lines = 0

    with open(input_path, "r", encoding="utf-8") as f:
        for line in f:
            total_lines += 1
            line = line.strip()

            if "ERROR" in line:
                error_lines += 1
                # Extract error type: "2026-03-15 10:30:00 ERROR TimeoutError: ..."
                parts = line.split("ERROR")
                if len(parts) > 1:
                    error_type = parts[1].strip().split(":")[0].strip()
                    error_counts[error_type] += 1

            elif "WARNING" in line:
                warning_lines += 1
                parts = line.split("WARNING")
                if len(parts) > 1:
                    warning_type = parts[1].strip().split(":")[0].strip()
                    warning_counts[warning_type] += 1

    # Write the report
    with open(output_path, "w", encoding="utf-8") as f:
        f.write(f"Log Analysis Report\n")
        f.write(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"{'=' * 50}\n\n")

        f.write(f"Total lines processed: {total_lines}\n")
        f.write(f"Error lines: {error_lines}\n")
        f.write(f"Warning lines: {warning_lines}\n\n")

        f.write("Top Errors:\n")
        for error_type, count in error_counts.most_common(10):
            f.write(f"  {error_type}: {count}\n")

        f.write("\nTop Warnings:\n")
        for warning_type, count in warning_counts.most_common(10):
            f.write(f"  {warning_type}: {count}\n")

    print(f"Report written to {output_path}")

# Usage:
# analyze_log("server.log", "log_report.txt")

Example 2: Configuration File Reader

A utility that reads a simple key=value config file, supporting comments and sections:

def read_config(filepath):
    """
    Read a configuration file with the format:
      # comment
      [section]
      key = value
    """
    config = {}
    current_section = "default"

    with open(filepath, "r", encoding="utf-8") as f:
        for line_num, line in enumerate(f, start=1):
            line = line.strip()

            # Skip empty lines and comments
            if not line or line.startswith("#"):
                continue

            # Section header: [section_name]
            if line.startswith("[") and line.endswith("]"):
                current_section = line[1:-1].strip()
                if current_section not in config:
                    config[current_section] = {}
                continue

            # Key = Value pair
            if "=" in line:
                key, value = line.split("=", 1)  # Split on first = only
                key = key.strip()
                value = value.strip()

                # Type conversion
                if value.lower() in ("true", "yes"):
                    value = True
                elif value.lower() in ("false", "no"):
                    value = False
                elif value.isdigit():
                    value = int(value)
                else:
                    try:
                        value = float(value)
                    except ValueError:
                        pass  # Keep as string

                if current_section not in config:
                    config[current_section] = {}
                config[current_section][key] = value
            else:
                print(f"Warning: could not parse line {line_num}: {line}")

    return config


def write_config(filepath, config):
    """Write a configuration dictionary to a file."""
    with open(filepath, "w", encoding="utf-8") as f:
        f.write("# Configuration file\n")
        f.write(f"# Generated on {__import__('datetime').datetime.now()}\n\n")

        for section, values in config.items():
            f.write(f"[{section}]\n")
            for key, value in values.items():
                f.write(f"{key} = {value}\n")
            f.write("\n")


# Example config file (app.conf):
# ---------------------------------
# # Application Configuration
#
# [server]
# host = localhost
# port = 8080
# debug = true
#
# [database]
# host = localhost
# port = 5432
# name = myapp
# ---------------------------------

# Usage:
# config = read_config("app.conf")
# print(config["server"]["host"])   # localhost
# print(config["server"]["port"])   # 8080 (as int)
# print(config["server"]["debug"])  # True (as bool)

Example 3: CSV Report Generator

A program that reads raw data from a CSV, processes it, and generates a summary report:

import csv

def generate_sales_report(input_csv, output_csv):
    """
    Read a sales CSV and generate a summary by region.

    Input CSV format: Date,Region,Product,Quantity,Price
    Output CSV format: Region,TotalSales,AverageOrderValue,OrderCount
    """
    region_data = {}

    # Read input data
    with open(input_csv, "r", encoding="utf-8") as f:
        reader = csv.DictReader(f)

        for row in reader:
            region = row["Region"]
            quantity = int(row["Quantity"])
            price = float(row["Price"])
            sale_amount = quantity * price

            if region not in region_data:
                region_data[region] = {"total_sales": 0, "order_count": 0}

            region_data[region]["total_sales"] += sale_amount
            region_data[region]["order_count"] += 1

    # Calculate averages and write output
    with open(output_csv, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=[
            "Region", "TotalSales", "AverageOrderValue", "OrderCount"
        ])
        writer.writeheader()

        for region, data in sorted(region_data.items()):
            avg_value = data["total_sales"] / data["order_count"]
            writer.writerow({
                "Region": region,
                "TotalSales": round(data["total_sales"], 2),
                "AverageOrderValue": round(avg_value, 2),
                "OrderCount": data["order_count"],
            })

    print(f"Report saved to {output_csv}")
    print(f"Processed {sum(d['order_count'] for d in region_data.values())} orders "
          f"across {len(region_data)} regions")

# Usage:
# generate_sales_report("sales_data.csv", "sales_summary.csv")

Example 4: Student Records System (JSON-Based CRUD)

A complete mini-application that creates, reads, updates, and deletes student records stored in a JSON file:

import json
from pathlib import Path

RECORDS_FILE = "students.json"


def load_students():
    """Load student records from the JSON file."""
    path = Path(RECORDS_FILE)
    if not path.exists():
        return []

    with open(RECORDS_FILE, "r", encoding="utf-8") as f:
        return json.load(f)


def save_students(students):
    """Save student records to the JSON file."""
    with open(RECORDS_FILE, "w", encoding="utf-8") as f:
        json.dump(students, f, indent=2, ensure_ascii=False)


def add_student(name, age, course, score):
    """Add a new student record."""
    students = load_students()

    # Generate a simple ID
    new_id = max((s["id"] for s in students), default=0) + 1

    student = {
        "id": new_id,
        "name": name,
        "age": age,
        "course": course,
        "score": score
    }

    students.append(student)
    save_students(students)
    print(f"Added student: {name} (ID: {new_id})")
    return new_id


def get_student(student_id):
    """Get a student by their ID."""
    students = load_students()
    for student in students:
        if student["id"] == student_id:
            return student
    return None


def update_student(student_id, **kwargs):
    """Update a student's fields."""
    students = load_students()

    for student in students:
        if student["id"] == student_id:
            for key, value in kwargs.items():
                if key in student and key != "id":
                    student[key] = value
            save_students(students)
            print(f"Updated student ID {student_id}")
            return True

    print(f"Student ID {student_id} not found")
    return False


def delete_student(student_id):
    """Delete a student by their ID."""
    students = load_students()
    original_count = len(students)

    students = [s for s in students if s["id"] != student_id]

    if len(students) < original_count:
        save_students(students)
        print(f"Deleted student ID {student_id}")
        return True

    print(f"Student ID {student_id} not found")
    return False


def list_students():
    """Display all students in a formatted table."""
    students = load_students()

    if not students:
        print("No student records found.")
        return

    print(f"\n{'ID':<5} {'Name':<20} {'Age':<5} {'Course':<15} {'Score':<6}")
    print("-" * 55)

    for s in students:
        print(f"{s['id']:<5} {s['name']:<20} {s['age']:<5} {s['course']:<15} {s['score']:<6}")

    print(f"\nTotal students: {len(students)}")


def search_students(query):
    """Search students by name (case-insensitive)."""
    students = load_students()
    results = [s for s in students if query.lower() in s["name"].lower()]

    if results:
        print(f"Found {len(results)} match(es):")
        for s in results:
            print(f"  ID {s['id']}: {s['name']} - {s['course']} (Score: {s['score']})")
    else:
        print(f"No students found matching '{query}'")

    return results


# Usage example:
# add_student("Priya Sharma", 22, "Python", 95)
# add_student("Rahul Verma", 24, "Data Science", 88)
# add_student("Ananya Patel", 23, "Machine Learning", 92)
#
# list_students()
# Output:
# ID    Name                 Age   Course          Score
# -------------------------------------------------------
# 1     Priya Sharma         22    Python          95
# 2     Rahul Verma          24    Data Science    88
# 3     Ananya Patel         23    Machine Learning 92
#
# Total students: 3
#
# update_student(2, score=91)
# delete_student(3)
# search_students("priya")

Common Mistakes

Mistake 1: Not Closing Files

# BAD — file may remain open if an error occurs
file = open("data.txt", "r")
content = file.read()
process(content)   # If this raises an error, file.close() never runs
file.close()

# GOOD — use 'with' to guarantee closing
with open("data.txt", "r") as file:
    content = file.read()
    process(content)

Mistake 2: Using the Wrong Mode

# BAD — accidentally overwriting data you wanted to add to
with open("log.txt", "w") as f:  # "w" erases everything!
    f.write("New entry\n")

# GOOD — use "a" to append
with open("log.txt", "a") as f:
    f.write("New entry\n")
# BAD — trying to write to a read-only file
with open("data.txt", "r") as f:
    f.write("Hello")  # io.UnsupportedOperation: not writable

# GOOD — open in write or read-write mode
with open("data.txt", "w") as f:
    f.write("Hello")

Mistake 3: Encoding Errors (UnicodeDecodeError)

# BAD — may fail if the file contains non-ASCII characters
with open("data.txt", "r") as f:
    content = f.read()
# UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0

# GOOD — specify encoding explicitly
with open("data.txt", "r", encoding="utf-8") as f:
    content = f.read()

# If you're unsure of the encoding, try 'latin-1' (never raises decode errors)
with open("data.txt", "r", encoding="latin-1") as f:
    content = f.read()

Mistake 4: Reading Entire Large Files into Memory

# BAD — loads a 2GB file entirely into memory
with open("huge_log.txt", "r") as f:
    content = f.read()  # May crash or freeze your computer!

# GOOD — process line by line
with open("huge_log.txt", "r") as f:
    for line in f:  # Reads one line at a time
        if "ERROR" in line:
            print(line.strip())

Mistake 5: Forgetting newline="" with CSV

# BAD — on Windows, this creates blank lines between rows
with open("data.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerows(data)

# GOOD — always use newline="" with csv
with open("data.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(data)

Mistake 6: Not Handling FileNotFoundError

# BAD — crashes if the file does not exist
with open("missing_file.txt", "r") as f:
    content = f.read()
# FileNotFoundError: [Errno 2] No such file or directory: 'missing_file.txt'

# GOOD — handle the error gracefully
try:
    with open("missing_file.txt", "r", encoding="utf-8") as f:
        content = f.read()
except FileNotFoundError:
    print("File not found. Please check the path.")
    content = ""

Mistake 7: Modifying a List While Iterating Over File Lines

# BAD — confusing logic that may produce unexpected results
with open("data.txt", "r") as f:
    lines = f.readlines()
    for i, line in enumerate(lines):
        lines[i] = line.upper()  # This works but is unclear

# GOOD — create a new list
with open("data.txt", "r", encoding="utf-8") as f:
    lines = [line.strip().upper() for line in f]

Best Practices

1. Always Use with for File Operations

# This is always the right choice
with open("file.txt", "r", encoding="utf-8") as f:
    content = f.read()

2. Always Specify Encoding Explicitly

# Be explicit — avoid platform-dependent behavior
with open("file.txt", "r", encoding="utf-8") as f:
    content = f.read()

3. Handle Errors Gracefully

from pathlib import Path

def safe_read(filepath):
    """Read a file with comprehensive error handling."""
    try:
        with open(filepath, "r", encoding="utf-8") as f:
            return f.read()
    except FileNotFoundError:
        print(f"Error: '{filepath}' does not exist.")
    except PermissionError:
        print(f"Error: No permission to read '{filepath}'.")
    except UnicodeDecodeError:
        print(f"Error: '{filepath}' contains non-UTF-8 characters.")
    except IsADirectoryError:
        print(f"Error: '{filepath}' is a directory, not a file.")
    return None

4. Use pathlib for File Paths

from pathlib import Path

# Modern, clean, cross-platform
data_dir = Path("data")
output_file = data_dir / "results" / "report.csv"

# Create parent directories if needed
output_file.parent.mkdir(parents=True, exist_ok=True)

# Check before reading
if output_file.exists():
    content = output_file.read_text(encoding="utf-8")

5. Process Large Files Line by Line

# Memory-efficient for any file size
def count_lines(filepath):
    count = 0
    with open(filepath, "r", encoding="utf-8") as f:
        for _ in f:
            count += 1
    return count

6. Use Temporary Files for Intermediate Data

import tempfile
import json

# Process data through a temporary file
with tempfile.NamedTemporaryFile(mode="w", suffix=".json",
                                  delete=True, encoding="utf-8") as tmp:
    json.dump(intermediate_data, tmp)
    tmp.flush()  # Ensure data is written to disk
    # Pass tmp.name to another function that reads it
    process_file(tmp.name)
# Temp file is cleaned up automatically

7. Use Atomic Writes for Critical Data

When writing important files, write to a temporary file first, then rename it. This prevents data corruption if the program crashes mid-write:

import json
import tempfile
import os
from pathlib import Path

def safe_json_write(filepath, data):
    """Write JSON data atomically to prevent corruption."""
    filepath = Path(filepath)

    # Write to a temporary file in the same directory
    fd, tmp_path = tempfile.mkstemp(
        dir=filepath.parent,
        suffix=".tmp"
    )
    try:
        with os.fdopen(fd, "w", encoding="utf-8") as f:
            json.dump(data, f, indent=2)

        # Atomic rename (on most filesystems)
        os.replace(tmp_path, filepath)
    except Exception:
        os.unlink(tmp_path)  # Clean up temp file on error
        raise

Summary of Best Practices

PracticeWhy
Use with statementGuarantees file is closed properly
Specify encoding="utf-8"Avoids platform-dependent encoding issues
Handle FileNotFoundErrorPrevents crashes on missing files
Use pathlib.Path for pathsCross-platform, readable, object-oriented
Iterate line by line for large filesKeeps memory usage low
Use newline="" with CSVPrevents blank rows on Windows
Use "x" mode for new filesPrevents accidental overwriting
Write to temp file, then renamePrevents data corruption on crash

Practice Exercises

Exercise 1: Word Counter

Write a program that reads a text file and prints:

  • Total number of lines
  • Total number of words
  • Total number of characters (excluding newlines)
  • The 5 most common words
# Hint structure:
def word_counter(filepath):
    """Count words, lines, and characters in a text file."""
    # Read the file
    # Count lines, words, characters
    # Use collections.Counter for most common words
    # Print results
    pass

# Expected output:
# Lines: 150
# Words: 1234
# Characters: 6789
# Most common words:
#   the: 45
#   is: 32
#   and: 28
#   to: 25
#   of: 22

Exercise 2: CSV Grade Calculator

Write a program that:

  1. Reads a CSV file with columns: Name, Math, Science, English
  2. Calculates the average score for each student
  3. Assigns a grade (A: 90+, B: 80+, C: 70+, D: 60+, F: below 60)
  4. Writes the results to a new CSV with columns: Name, Average, Grade
# Input CSV (grades.csv):
# Name,Math,Science,English
# Priya,95,88,92
# Rahul,72,68,75
# Ananya,88,91,85

# Output CSV (results.csv):
# Name,Average,Grade
# Priya,91.67,A
# Rahul,71.67,C
# Ananya,88.0,B

Exercise 3: JSON Phonebook

Build a phonebook application using JSON for storage that supports:

  • Adding a contact (name, phone, email)
  • Searching by name
  • Deleting a contact
  • Listing all contacts
  • Exporting contacts to CSV
# Hint: Use the Student Records System example as a starting point
# Store data in phonebook.json
# Each contact: {"name": "...", "phone": "...", "email": "..."}

Exercise 4: File Organizer

Write a program that organizes files in a directory by moving them into subfolders based on their extension:

# Before:
# downloads/
#   photo1.jpg
#   report.pdf
#   data.csv
#   script.py
#   notes.txt
#   image.png

# After:
# downloads/
#   Images/
#     photo1.jpg
#     image.png
#   Documents/
#     report.pdf
#   Data/
#     data.csv
#   Code/
#     script.py
#   Text/
#     notes.txt

# Hint: Use pathlib for path operations and shutil.move for moving files
# Define a mapping: {".jpg": "Images", ".png": "Images", ".pdf": "Documents", ...}

Exercise 5: Log File Merger

Write a program that:

  1. Reads multiple log files from a directory (all .log files)
  2. Each line has a timestamp format: 2026-03-15 10:30:00 - message
  3. Merges all lines from all files
  4. Sorts them by timestamp
  5. Writes the sorted, merged result to a single output file
# Hint:
# 1. Use pathlib.Path.glob("*.log") to find all log files
# 2. Read all lines from all files into a single list
# 3. Sort by the timestamp portion of each line
# 4. Write the sorted lines to output

Exercise 6: File Backup Tool

Write a backup utility that:

  1. Takes a source directory and a backup directory as arguments
  2. Copies all files from source to backup, preserving directory structure
  3. Only copies files that are newer than the backup copy (or missing from backup)
  4. Generates a backup report listing all copied files and total bytes transferred
# Hint:
# Use pathlib for path operations
# Use os.stat().st_mtime to compare modification times
# Use shutil.copy2 to copy files (preserves metadata)
# Use os.walk or Path.rglob to traverse directories

Summary

In this chapter, you learned:

  • Why file handling matters — Programs need to persist data, read configurations, process logs, and exchange data with other systems.
  • Opening files — The open() function with various modes (r, w, a, x, r+, w+, a+) and the importance of specifying encoding="utf-8".
  • The with statement — The recommended way to work with files, guaranteeing proper cleanup even when errors occur.
  • Reading filesread(), readline(), readlines(), line-by-line iteration (most memory-efficient), and chunk-based reading.
  • Writing fileswrite(), writelines(), the difference between overwriting (w) and appending (a), and creating files safely with x mode.
  • File modes in depth — Text vs. binary modes, the + read-write modes, and when to use each.
  • File pointer manipulation — Using tell() and seek() to navigate within files.
  • CSV files — Reading and writing with csv.reader, csv.writer, csv.DictReader, and csv.DictWriter, plus custom delimiters and quoting options.
  • JSON filesjson.load(), json.dump(), json.loads(), json.dumps(), pretty-printing with indent, handling nested data, and custom serialization.
  • Binary files — Reading and writing bytes, copying files, and identifying file types by magic bytes.
  • File system operations — Using os and os.path for checking existence, listing directories, creating and removing directories, and walking directory trees.
  • The pathlib module — The modern, object-oriented approach to file paths with Path objects, glob patterns, and convenience methods.
  • Temporary files — Using the tempfile module for intermediate data that cleans up after itself.
  • Best practices — Always use with, specify encoding, handle errors gracefully, use pathlib for paths, and process large files line by line.

File handling is a foundational skill that you will use in almost every Python project. With the techniques covered here, you are well-equipped to read, write, and manage files of all types confidently.