Tyler Poff

Python Generator Functions!


Today I want to introduce you to a cool concept in Python called Generator Functions!

So picture this, you are asked to optimize a data analysis program, the program needs to run against some very large datasets that have become so big that they can no longer be entirely stored in memory. So instead you are tasked with writting some code that will "generate" a single element in the data set at a time. This can easily be done using Python generators!

Or perhaps your company has discovered an amazing security algorithm, but it needs to continiously find new prime numbers in order to operate, and you have to write code to "generate" a potentially infinite number of prime numbers.

Or say you just want to annoy your friend by sending him text messages every 30 seconds with new numbers in the fibonnacci sequence until he returns your copy of Pokemon: Let's Go Pikachu. Generators would be an awesome way to continually generate those fibonacci numbers!

The idea behind Generators

In case you haven't guessed the idea yet, Generator Functions are a feature of python that allows you to continually produce new return values. The power of generator functions is, unlike regular functions, generators can retain information between calls to the function.

So take the function below

def get_n_numbers_in_squared_sequence(n):
    result = []
    for i in range(n):
    return result

The get_n_numbers_in_squared_sequence function will return the first n numbers in the squared sequence (if the name didn't give it away). But the issue is once you exit the function, you essentially reset it, so if you decided that after your initial call to the function you wanted to continue processing more numbers, you couldn't just go back and ask for the next set of n, you would have to either change the code to allow you to define a starting point, or just ask for n*2 numbers and just start with the latter half.

This is where generators come in. Unlike other functions, when a generator returns a value it RETAINS ITS STATE. So when we get a value back from a generator, we can go back and re-call it and it will remember all of its local variables and essentially keep processing from where it left off before. So rather than our function above we can code it to just return the next number in the sequence and just continually call it to keep processing the sequence FOREVER!

Cool stuff right? This means that all you need to do is just continually call the generator function and it will just give you the next values in whatever sequence it's programmed to do. It will remember, so you can start and stop processing the sequence on the fly and just pick right back up where you left off before. That's the idea behind generators, an object that will always be able to give you the next value in a sequence.


Creating a Generator Function

So in order to re-write our squared numbers sequence as a generator we need to use a new keyword, the yield keyword. Using yield in a function will tell python that this function is a generator and so will be treated differently than other functions. Everytime we hit the yield statement the generator will return whatever values follow it, just like a return statement. It will return these values to the calling function. The next time the generator is activated it will resume executing instructions immediatly after the yield statement. All variables will retain their state as if the generator had never returned at all, and it will pick up where it had left off before it had hit the yield statement. In face you can have muliple yield statements in a generator and they will all act in the same manner. Anytime a yield is hit, the generator returns whatever value back to the caller and when activated again the generator will pick up exactly where it had left off.

Basically the yield keyword says that I am temporarily giving up control and returning a value, but if I were to ever get control back I will start back up here. Make sense?

Rewritting the Squared Sequence function as a generator could look something like this:

def squared_number_sequence_generator():
    i = 0
    while True:
        yield i**2
        i += 1

Because we included the yield keyword python will now treat our new function as a generator. Now how do we use it? So unlike other functions when we call squared_number_sequence_generator() we won't get a value back, instead we get what's called a generator iterator. This is a unique instance of the generator that will be able to retain it's state, it's a unique object that we use to get the next values from the generator. After we have this generator iterator we can get the next value from the generator it represents using the next() function on it. Note: the next() function in python is used to advance any sort of iterator object, so if you write a for loop to iterate through the list, under the hood python is really using the next function on the list to assign the next value for i.

So let's see this in action.

# generator is the same as above
def squared_number_sequence_generator():
    i = 0
    while True:
        yield i**2
        i += 1
# call squared_number_sequence_generator() which will give us an instance of our generator.
squared_sequence_generator_instance = squared_number_sequence_generator()

# now in order to get values from the generator we call next on it, like so
new_value = next(squared_sequence_generator_instance)

# and again
new_value = next(squared_sequence_generator_instance)

# and again...
new_value = next(squared_sequence_generator_instance)

# for as many times as we want.

So let's step through the example just to make sure we understand. We define the squared_number_sequence_generator, we call it like a normal function and what we get back is a generator iterator, by calling next() on this iterator this will activate the function, the function will run until it hits a yield statement at which point it will return whatever value is attached to the yield. At this point the generator will not resume executing code, but will retain in memory all of it's local variable values and where in it's code it yielded from, so where it stopped running. When next() is run again against the iterator the generator starts back up from where it left off and continues until it hits another yield statement, and the process repeates for as long as you need to.

And that's the basics of how to make a generator!

Generators in a for loop

Because generator instances are treated like iterators in python, you can actually plug them into loops just like you would a list.

Take this sample for instance:

# a simple generator that will only yield 3 values,
def one_two_three_generator():
    yield 1
    yield 2
    yield 3

# just like lists you can use generators in a loop
# this loop will print out 1, 2, and 3 and then exit
for i in one_two_three_generator():

Just like the code comments says, the above generator will only yield 3 values and then no longer yield values, when plugging this into a loop python will treat this as "the end" of the generator and exit the loop. I should point out that if you had a continuous generator, like our squared_number_sequence_generator from above, the loop will never exit because the generator will always produce new values.

Muliple generator instances

As you may have guessed, it is possible to have muliple instances of the same generator.

Take this for example, we create 2 instances of our squared_number_sequence_generator from above, but will advance them at different rates. As you can see by running this code each generator will have it's own unique values independent of each other. So you can advance them at different rates.

def squared_number_sequence_generator():
    i = 0
    while True:
        yield i**2
        i += 1

instance_1 = squared_number_sequence_generator()
instance_2 = squared_number_sequence_generator()

# we will loop 5 times,
for i in range(5):
    # for every loop we will call next on instance_1 3 times, and only call next
    # on instance_2 once, this will demonstrate that both instances are independent
    # and can advance at different rates
    print("next(instance_1) = ", next(instance_1))
    print("next(instance_1) = ", next(instance_1))
    print("next(instance_1) = ", next(instance_1))
    print("next(instance_2) = ", next(instance_2))

So running the above code will show that yes, generator instances are seperate and can advance at different rates.

Passing Values Into Active Generators

So that's cool, but what if you wanted to pass in a value to an active generator. Python gives us a way to do that too.

In order to pass values into an active generator python gives us the send() function. In Python we are given a special function called send() which is used to send input into the generator. By calling send() python will also call next() and return the next yeilded value. In the actual generator, in order to recieve a value passed in by send we will treat the yield statement as a function or object and assign a variable to it's output. So this passed in value will be represented as a return value from yield. Which may look a little strange when you first see it.

The next code sample demonstrates how you can use send to pass in arguments to a running generator and how you can use those variables within the generator itself.

# define a simple generator that will return a sequence from 0 to max_n
# just to demonstrate that you can pass in arguments when first constructing a generator
def counting_generator(max_n):
    for i in range(max_n+1):
        # so here's the important line, we are yielding control back to the caller and
        # returning a value i**2 but we also expect that, when the generator is activated
        # again that we're going to recieve a value, so we treat yield as a function that
        # will return a variable. Looks a little strange but it works.
        passed_in_value = yield i

        # and now we just demonstrate that we got the new value by printing out to the console.
        print("in counting_generator, %s was passed in"%passed_in_value)

# create our generator
generator_instance = counting_generator(5)
# an important note, we can only pass in values from a yield, so if we tried to call send on a
#fresh generator that hasn't yielded yet we will get an exception.
value = next(generator_instance)

# after our generator has yielded once, we can now use the send() function to send a value into
# the generator and get the next value. as seen here.
next_value = generator_instance.send(8)
next_value = generator_instance.send(9)
next_value = generator_instance.send(10)
next_value = generator_instance.send(11)

This is a rather strange addition to generators, but has its uses so it's good to be aware of it.

Final Thoughts

So that's basically generators. They're a cool tool in python, and while you can use other designs to achieve the same effect (such as classes and such), it's still a cool feature to have in mind and in my opinion can be abit simpler than implementing this in another way.

These can really come in handy if you need to work on something that can't feasibly fit in memory, such as infinite sequences, or an extremely large dataset, or any host of other cool things.

I hope you learned something useful in this post, happy coding!

Page Links