Python Generators Iterator

Generators in Python

Sometimes when working with bulk data whose nature is somewhat unpredictable, such as a stream of events, or data chunks, or anything that can be batched, like in ETL pipilines where data passes across several transformation stages. My premise is that the process must work well within a constrained environment, either with limited RAM, processing capabilities, or storage, such as Google Cloud Functions or AWS Lambdas (PaaS), Kubernetes pods, or Cloud Compute / EC2 instances, where I have to be able to process portions of information without overloading the runtime. Generators are a really cool construct that helps solving those tasks in python (In Javascript we have generators too, and in go we have channels, so the principles here somewhat apply to those languages too). To understand generators, let us understand what are generator iterators first

Generator Iterator

A generator iterator is a particular python object like a function, but instead of calling return to give a return value, we use the keyword yield. When we yield from within a generator iterator, it temporarily suspends processing the code block, remembering the location execution state, including local variables and pending try-statements), and returns to the context where the generator iterator was called. Subsequent calls to the generator iterator resumes its execution, thus it picks up where it left off. This behavior is opposed to functions which start fresh on every invocation.

Generator function

In few words, a generator function is a function that returns a generator iterator object. It looks like a regular function but instead of mere return statements we use the yield statement. Each time we call the next() method of the generator – for instance, within a for loop – the generator resumes execution from where it left after yield was called, just as described above.

We will understand it better considering the following example of a simple generator function,

def get_powers(array): 
    for element in array:
        yield element*element

This defines a generator function. The generator iterator can be obtained as

it = get_powers([2,3,4])

This is, the function that returns a generator iterator is a generator function, in our example the generator iterator is it, the generator function is get_powers. We can perform iterations of it by calling next() on our generator

>>> next(it)
4
>>> next(it)
9
>>> next(it)
16
>>> next(it)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-29-bc1ab118995a> in <module>
----> 1 next(it)

StopIteration:

Which can be also achieved if we ran

>>> it.__next__()

Notice that once the iterator runs out of yield statements a StopIteration error is raised. We can capture this exception to stop iterating over the iterator values as needed.

The usefulness of iterators is trulyseen in loops, for instance

it = get_powers([5,6,7])
for el in it:
    print(el)

The for automatically intercepts the end of iteration for us (since the iterator is compliant with the __next__ and StopIteration expected by the for)

Class Generators

Now that we have seen how generator iterators and functions work, lets try something more refined. We can use classes as generators too. Take a look a the following example

class Spell:
    def __init__(self, word):
        self.word = word
        self.char_size = len(word)
        self.index = -1

    def __iter__(self):
        return self

    def __next__(self):
        if self.index == self.char_size - 1:
            raise StopIteration
        self.index +=1
        return self.word[self.index]

So when we run,

r = Spell("hello world!") 
for c in r:
    print(c)

We get as a result,

h
e
l
l
o
 
w
o
r
l
d
!

Thus we have defined a class that behaves as a generator too. This is thanks to defining the methods __iter__ and __next__, and also StopIteration is raised when the generator runs out of elements. This makes the class compliant as a generator iterator.

Final remarks

You can build complex classes or generators to iterate through batches of data, or chunks from a data stream, safely, since you can control how much RAM is used, and the pace at which the data is processed to completion. In future posts will show examples of this with a constrained docker environment

Resources

  1. Official Python Docs on Generators
  2. An interesting StackOverflow answer
comments powered by Disqus