Introduction to PHP Generators

Published on September 15, 2013 by

One of the key features of PHP 5.5 is the implementation of so-called generators with the yield keyword. The purpose of this article is to give you an overview of PHP generators and why they are an important addition to the PHP language.

What is a Generator?

Before we get into the details of generators, I will first begin by introducing generators by explaining what they can do and why they matter.

A generator is similar to the regular functions that we already know. Generators generate data to be iterated over but does not return this data with the return keyword as would normally be the case. The difference is that a generator gradually makes data available to whatever code uses it instead of building the entire data set and then returning it, at which point the execution of the function ends. For instance, consider PHP’s range function which builds an array of integers such as from 1 to 100 and then returns it. A generator would yield each number one by one as they are requested – perhaps by a foreach loop. Therefore values are made available as they are needed, which is the same as the Just in time (JIT) terminology. Therefore a generator is called every time a foreach loop needs more data, and every time a generator yields a value, the state of the generator is saved internally such that it can be resumed when additional values are needed.

Once there are no more values to be yielded from the generator, then the generator function can simply exit and stop execution. The calling code (e.g. a foreach loop) then continues just as if it were using an array that has run out of values.

The key to understanding generators is thus that the generator function provides a piece of data to the calling code. When this code requests more data, the execution of the generator is continued, hence why the state of the generator is maintained internally.

Do not worry if all of this sounds confusing at first; this is all easier to understand with code examples. Below I have made an implementation of a function that is quite similar to PHP’s range function.

/**
 * Returns an array of integers between $from and $to (similar to range())
 *
 * @param int $from
 * @param int $to
 *
 * @return int[]
 */
function getNumberRange($from, $to) {
	// Parameter validation excluded for simplicity
	
	$results = array();

	while ($from <= $to) {
		$results[] = $from;
		$from++;
	}
	
	return $results;
}

The function above is indeed quite simple; it builds an array of integers and returns it. To use it, we could simply call it within a foreach loop to print out the numbers.

foreach (getNumberRange(1, 10) as $number) {
	echo $number . '...';
}

// Output: 1...2...3...4...5...6...7...8...9...10...

What happens is that the getNumberRange function generates all of its result and returns it, and then the foreach loop begins to iterate on this result set, which is an array.

If we were to implement the same functionality with a generator, which is available since PHP 5.5, then we could change the getNumberRange like below.

/**
 * Yields integers between $from and $to
 *
 * @param int $from
 * @param int $to
 */
function getNumberRangeGenerator($from, $to) {
	while ($from <= $to) {
		yield $from;
		$from++;
	}
}

The first thing to note is that no return keyword is being used to return the results. Instead, the yield keyword comes into play. For each iteration of the while loop, an integer is yielded or "made available" to the code that uses the generator, such that data is provided without having to build the entire array first. Below is an example of how this generator can be used.

foreach (getNumberRangeGenerator(1, 10) as $number) {
	echo $number . '...';
}

// Output: 1...2...3...4...5...6...7...8...9...10...

As you can see in the above example, using the generator is exactly the same as if it had been a regular function returning an array. Instead of waiting for the entire array to be returned, the foreach loop begins to iterate when the first value is yielded. When the loop is ready to continue iterate, the generator is prompted for another value and so forth.

Why Generators are Important

Initially, this may not seem like a big deal. However, imagine the following example. For whatever reason, we have to build an array of integers between 1 and 1,000,000. With a traditional function, this array would have to be constructed before it could be used. The important thing to note here is the memory footprint this causes; that is a lot of memory that is required just for holding these values! More often than not, each value just has to be used once before it can effectively be removed from memory. As such, this leaves us with a waste of the server's resources. Instead, each value can be made available individually (by yielding from a generator) to a foreach loop, which means that the server no longer needs to hold all of the values in memory, but merely a single value at a time. This would effectively reduce the memory usage from over 100 MB to less than 1 KB!

Can't I Just Implement the Iterator Interface?

By now you might be asking yourself if you cannot just implement the Iterator interface. You can, but you will find that you end up writing a lot of boilerplate code every time. Such code is not necessary with generators as this all happens behind the scenes and is completely abstracted from the developers. As such, what happens internally is that a Generator object is created for us when a generator is initially called. This object takes care of managing the state of the generator; the Generator class fortunately implements the Iterator interface, so we do not have to implement this ourselves. In the end there is nothing stopping you from implementing the interface on your own, but it is less convenient. This is particularly evident in this example.

Yielding Key-Value Pairs

If you need to return a key-value pair, you will be happy to know that this is possible from within a generator, too. The syntax can be seen in the example below.

/**
 * Yields key-value pairs. Key: first name, value: last name
 *
 * @param array $users
 */
function getUserNames(array $users) {
	$i = 0;
	$userCount = count($users);
	
	while ($i < $userCount) {
		yield $users[$i]['firstName'] => $users[$i]['lastName'];
	
		$i++;
	}
}

$users = array(
	0 => array(
		'firstName' => 'Antoine',
		'lastName' => 'Cryan',
	),
	1 => array(
		'firstName' => 'Krystina',
		'lastName' => 'Peasley',
	),
	2 => array(
		'firstName' => 'Jenice',
		'lastName' => 'Sepeda',
	),
);

// Print out the names of the users
foreach (getUserNames($users) as $firstName => $lastName) {
	echo $firstName . ' ' . $lastName . '<br />';
}

// Output:
Antoine Cryan
Krystina Peasley
Jenice Sepeda

Of course the above could be accomplished solely with the foreach loop, so it is merely an example of how key-value pairs can be yielded from a generator.

Stopping Execution of Generators

I previously mentioned that generators do not return data, and attempting to do so will result in a syntax error. However, the return keyword on its own can be used to stop the execution of a generator - exactly as is possible in a regular function.

/**
 * Yields the integers that up till a given number
 *
 * @param int $stop
 *
 */
function rangeStop($stop) {
	if (!is_int($stop) || $stop <= 0) {
		return;
	}

	$i = 0;
	
	while ($i < $stop) {
		yield $i;

		$i++;
	}
}

The above example stops the generator if it is called with an invalid parameter. In practice, you would, however, probably want to throw an exception instead.

Yielding by Reference

In the generators we have discussed so far, we have been yielding by value. This means that when the values are iterated upon, one cannot change the values within the generator from which they originated. The same principle applies to regular functions; parameters are passed by value by default, meaning that a copy of the data is used within the function. As such, the parameters are so-called local variables that are automatically cleared from memory when they run out of scope (which in this case means that the function ends). If one wanted to change a parameter such that it is reflected outside of the function, one can pass these by reference. To do so, simply prepend an ampersand & to the function name as well as the parameter name. Explaining this in details is outside the scope of this article, but it is important to understand this as it is the basis for yielding by reference. If you need to refresh your memory, then the official documentation is an excellent choice.

Applying this to generators is simple. All one has to do is to prepend an ampersand to the name of the generator and to the variable used in the iteration. That is all.

/**
 * Loops while $from > 0 and yields $from on each iteration. Yields by reference.
 *
 * @param int $from
 */
function &countdown($from) {
	while ($from > 0) {
		yield $from;
	}
}

foreach (countdown(10) as &$value) {
	$value--;
	echo $value . '...';
}

// Output: 9...8...7...6...5...4...3...2...1...0...

The above example shows how changing the iterated values within the foreach loop changes the $from variable within the generator. This is because $from is yielded by reference due to the ampersand before the generator name. As such, the $value variable within the foreach loop is a reference to the $from variable within the generator function.

Yielding NULL Values

In functions you can return NULL, and in generators you can yield NULL. The latter is done by simply leaving out a value after the yield keyword or alternatively to enter NULL, as demonstrated below.

function yieldNullImplicitly() {
	yield; // Yields NULL
}

function yieldNullExplicitly() {
	yield null;
}

The generators above are equivalent.

Conclusion

There is surely more to generators than what has been covered in this article. However, you should now have a good understanding of how generators work and what the advantages of using them are.

I began by explaining how generators yield values to the calling code one by one rather than building an entire result set at once as would normally be the case. This means that to iterate on a sequence of 100 values, for instance, the web server only needs to store one value in memory at a time. Without a generator function, all 100 would have to be stored in memory, leading to high memory consumption.

We have also seen how implementing a generator is simpler than implementing the Iterator interface as this would require a lot of boilerplate code. Instead, generators handle the details of this behind the scenes, making developers' jobs easier. We have also seen how generators can yield key-value pairs instead of only single values and how this can be paired with a foreach loop. Lastly, I demonstrated how values can be yielded by reference, meaning that the values used within generators can be changed from the calling code - exactly as with regular functions.

I hope that this article has given you a good understanding of how generators work in PHP 5.5. Generators are powerful once you get the hang of it. Thank you for reading!

Author avatar
Bo Andersen

About the Author

I am a back-end web developer with a passion for open source technologies. I have been a PHP developer for many years, and also have experience with Java and Spring Framework. I currently work full time as a lead developer. Apart from that, I also spend time on making online courses, so be sure to check those out!

One comment on »Introduction to PHP Generators«

  1. Juan Carrillo

    This, helped a lot. Great article, I understood the generators concept. Thank you.

Leave a Reply

Your e-mail address will not be published.