Python > Advanced Python Concepts > Iterators and Generators > Iterator Protocol (`__iter__`, `__next__`)

File Line Iterator

This demonstrates creating an iterator to read a file line by line. This approach avoids loading the entire file into memory, making it suitable for large files.

Code Implementation

The FileLineIterator class takes a filename as input. The __init__ method opens the file. The __iter__ method returns the iterator object itself. The __next__ method reads a line from the file. If the file is empty or the end of the file is reached, it closes the file and raises StopIteration. The strip() method removes leading/trailing whitespace.

class FileLineIterator:
    def __init__(self, filename):
        self.filename = filename
        self.file = open(self.filename, 'r')

    def __iter__(self):
        return self

    def __next__(self):
        line = self.file.readline()
        if not line:
            self.file.close()
            raise StopIteration
        return line.strip()

# Example Usage (assuming a file named 'my_file.txt' exists)
try:
    with open('my_file.txt', 'w') as f:
        f.write('Line 1\n')
        f.write('Line 2\n')
        f.write('Line 3\n')

    file_iter = FileLineIterator('my_file.txt')
    for line in file_iter:
        print(line)
finally:
    import os
    os.remove('my_file.txt')

Concepts Behind the Snippet

This snippet demonstrates the practical application of the iterator protocol for file processing. By implementing the __iter__ and __next__ methods, we can efficiently iterate through the lines of a file without loading the entire file into memory.

Real-Life Use Case

This pattern is extremely useful for parsing large log files, processing data files with millions of rows, or any situation where loading the entire file into memory would be impractical. It's a cornerstone of efficient data processing in Python.

Best Practices

  • Always ensure that the file is closed properly when the iteration is complete, either by using a try...finally block or a context manager (with open(...)).
  • Handle potential file-related exceptions (e.g., FileNotFoundError, IOError).
  • Consider adding error handling for malformed lines in the file.

Interview Tip

Be prepared to discuss the benefits of using iterators for file processing, especially when dealing with large files. Also, be ready to explain how to handle potential exceptions that might arise during file I/O.

When to Use Them

This pattern is most beneficial when working with files that are too large to fit comfortably into memory. It allows you to process the data in a streaming fashion, only loading one line (or a small chunk) at a time.

Memory Footprint

The memory footprint is significantly reduced compared to loading the entire file into memory. Only a single line (or a small buffer) is stored in memory at any given time.

Alternatives

Python's built-in file object is already an iterator. You can directly iterate over a file object using a for loop. Generators can also be used to achieve similar results with more concise code (see example below).

Generator Alternative

This generator function achieves the same result as the FileLineIterator class with significantly less code. It leverages Python's with statement for automatic file closing and the yield keyword to produce lines on demand.

def file_line_generator(filename):
    with open(filename, 'r') as f:
        for line in f:
            yield line.strip()

# Example usage:
# for line in file_line_generator('my_file.txt'):
#   print(line)

Pros

  • Memory efficiency for large files.
  • Simple and readable code.

Cons

  • Requires careful handling of file closing and potential exceptions.

FAQ

  • Why is it important to close the file in the __next__ method?

    Closing the file releases the file handle and ensures that any buffered data is written to disk. Failing to close the file can lead to resource leaks and data corruption.
  • Can I use this approach to read binary files?

    Yes, but you'll need to open the file in binary mode ('rb') and handle the binary data appropriately. Instead of readline(), you might use read(size) to read a specific number of bytes.