Python > Advanced Python Concepts > Iterators and Generators > Iterator Protocol (`__iter__`, `__next__`)
File Line Iterator
This demonstrates creating an iterator to read a file line by line. This approach avoids loading the entire file into memory, making it suitable for large files.
Code Implementation
The FileLineIterator
class takes a filename as input. The __init__
method opens the file. The __iter__
method returns the iterator object itself. The __next__
method reads a line from the file. If the file is empty or the end of the file is reached, it closes the file and raises StopIteration
. The strip()
method removes leading/trailing whitespace.
class FileLineIterator:
def __init__(self, filename):
self.filename = filename
self.file = open(self.filename, 'r')
def __iter__(self):
return self
def __next__(self):
line = self.file.readline()
if not line:
self.file.close()
raise StopIteration
return line.strip()
# Example Usage (assuming a file named 'my_file.txt' exists)
try:
with open('my_file.txt', 'w') as f:
f.write('Line 1\n')
f.write('Line 2\n')
f.write('Line 3\n')
file_iter = FileLineIterator('my_file.txt')
for line in file_iter:
print(line)
finally:
import os
os.remove('my_file.txt')
Concepts Behind the Snippet
This snippet demonstrates the practical application of the iterator protocol for file processing. By implementing the __iter__
and __next__
methods, we can efficiently iterate through the lines of a file without loading the entire file into memory.
Real-Life Use Case
This pattern is extremely useful for parsing large log files, processing data files with millions of rows, or any situation where loading the entire file into memory would be impractical. It's a cornerstone of efficient data processing in Python.
Best Practices
try...finally
block or a context manager (with open(...)
).FileNotFoundError
, IOError
).
Interview Tip
Be prepared to discuss the benefits of using iterators for file processing, especially when dealing with large files. Also, be ready to explain how to handle potential exceptions that might arise during file I/O.
When to Use Them
This pattern is most beneficial when working with files that are too large to fit comfortably into memory. It allows you to process the data in a streaming fashion, only loading one line (or a small chunk) at a time.
Memory Footprint
The memory footprint is significantly reduced compared to loading the entire file into memory. Only a single line (or a small buffer) is stored in memory at any given time.
Alternatives
Python's built-in file object is already an iterator. You can directly iterate over a file object using a for
loop. Generators can also be used to achieve similar results with more concise code (see example below).
Generator Alternative
This generator function achieves the same result as the FileLineIterator
class with significantly less code. It leverages Python's with
statement for automatic file closing and the yield
keyword to produce lines on demand.
def file_line_generator(filename):
with open(filename, 'r') as f:
for line in f:
yield line.strip()
# Example usage:
# for line in file_line_generator('my_file.txt'):
# print(line)
Pros
Cons
FAQ
-
Why is it important to close the file in the
__next__
method?
Closing the file releases the file handle and ensures that any buffered data is written to disk. Failing to close the file can lead to resource leaks and data corruption. -
Can I use this approach to read binary files?
Yes, but you'll need to open the file in binary mode ('rb'
) and handle the binary data appropriately. Instead ofreadline()
, you might useread(size)
to read a specific number of bytes.