The penny game considered harmful

The lean/agile penny game has become a standard way to demonstrate the benefits of small batch sizes in a production flow. However, as a demonstration of the effects of agile transformation on a software development team it is woefully inadequate. In this article we describe the classic penny game as played in much agile training, critique its accuracy as a model for agile software development, and provide a simulation that you can run yourself, showing what will happen if the software developers are not supported in adopting the practices of Extreme Programming.

Penny game mechanics

The penny game works like this: We set up a "production line" consisting of a sequence of workers. Batches of work are given to the first worker in the line as a set of pennies. Each penny represents a unit of work in a project. The worker takes a batch of pennies, flips each of them once to represent completing that work item, and then passes the batch to the next worker downstream. When the final worker completes a batch, these are considered to be "delivered" to the customer.

The game is usually run first with a large batch size, typically 20 coins. Then it is run again, this time with a batch size of 5 coins. Metrics are calculated for each run, and these usually demonstrate the superior performance of working in small batches:

Value delivered:
The total value of the coins delivered through the whole process in each run. In the simulations below this is simply the total number of coins processed by each production line. In manual penny game demonstrations, which usually last only a few minutes, this metric shows a large difference in value delivered. However if the simulations are run for long enough, which you can do for yourself below, the marginal increase in value delivered by small-batch working becomes negligible.
Cycle time:
The time it takes one coin to pass through the entire production process. In software development this could be the time it takes us to fix a production defect, for example. In the classic penny game, cycle time is much smaller for small-batch working.
Time to first value:
The period elapsed before the process delivers any usable value to the customer. Again, this is always lower (ie. better) for small-batch working.
Work in process:
The total amount of work that has been started, but not yet delivered. This represents the investment in the work that is currently in flight, and is also often representative of the levels of frustration felt by those outside of the team. Yet again, in the classic penny game there is much less work in process in the small-batch variant.

Criticisms

The penny game originated in manufacturing as a demonstration of the power of working in small batches. However, the game is often used to make the case for adopting agile working in software development projects. Unfortunately the game, as played in manual demonstrations, lacks many of the true features of software development:

Task size
In the penny game every worker takes roughly the same amount of time to flip each coin. But in real software development, there can be a huge difference between the amount of time required to write a story and the amount of time required to develop it. This means that, regardless of the batch size, analysed work is likey to pile up waiting to be developed. One antidote to this effect is to give the developers training and support in modern development practices, including the XP practices of YAGNI, Merciless Refactoring, Continuous Delivery etc. Another is to create effective pull systems, so that work is only done (by anyone) when there is a customer demand for it.
Coupling
As the codebase grows, coupling between logically unrelated areas also grows. This coupling exerts a braking effect on the team, causing stories to take longer and longer over the course of the project. Usually this also means that many stories cease to be releasable on their own -- the effective batch size grows continually.
Defects
Any defects found by users or in downstream testing will disrupt later development tasks. The cost of interruption is high, and usually has a larger negative effect on development work than on analysis work.
Multi-tasking
Developers often work on several tasks "simultaneously", or can be taken off development to work on the analysis of future tasks, or to attend meetings. All of these distractions increase the cost of developing any task, due to the high spin-up time required to get back in the zone after any interruption of more than a few minutes.

None of these factors is modelled in the classic penny game. The result is that the game as performed in most "agile" demonstrations gives a false impression as to the benefits to be gained by merely reducing the size of planning / release batches. In fact, if the developers are not trained and supported in the XP practices, but instead continue working as they did before, the resulting "agile transformation" will eventually under-perform compared to the team's previous processes.

You can see these effects at work using the simulation below. Leave it running for around 350-400 ticks to see how a development team without XP training and support will completely negate the positive effects of small batch sizes.

The simulation

The simulation below consists of three production lines or teams. Each team executes the same process in exactly the same way, but each has been configured with parameter values that represent different software development processes:

"Waterfall"
This is the classic penny game played with a batch size of 20 coins. It is usually intended to represent a "waterfall" project lifecycle, although as a representation of software development projects it lacks verisimilitude.
"Agile"
This is the classic penny game played with a batch size of 5 coins. It is usually intended to represent an "agile" project lifecycle. The Criticisms above show why I believe this to be naive.
"Scrum"
This production line is configured to reflect what often happens in "agile adoption" projects. If the developers are not given the same amount of training and long-term support that the managers and analysts are given, they will continue to use software development practices that are most appropriate to large-batch processes.

The production line configuration parameters are:

Batch size
The batch size used by each worker in the team. The larger initial batch size used by Development in the "Scrum" simulation is intended to reflect the fact that many development teams find it difficult to work in small increments, without the up-front design to which they are accustomed.
Batch size increment
The amount by which the worker's batch size grows after delivering each iteration. Intended to reflect the effects of coupling and legacy code. Paradoxically, this effect is often magnified by asking the developers to forego up-front design without supporting them in adoption of something like the XP practices.
Task size
The number of clock ticks required to complete any task. The larger task size for Development in the "Scrum" simulation reflects the fact that creating production software often takes longer than writing requirements. Also (for now) reflects the time lost due to interruptions such as defects.

To run all simulations, press the play button below. You can pause at any time, or use the reset button to start again from an empty project. Use the sliders button to see the configuration parameters used for each team, and how these valueshave changed during the simulation run.

Enjoy!