First of all, we should ask ourselves: “Why do we estimate?”.

In a lot of cases it’s because of the wrong reasons, but let us assume we are in a context where estimates are needed. 

What would that context look like?

A Large-Scale setting where a Rolling Wave Plan is used by other parts in the organization (sales, marketing, heck even customers), increasing the complexity IT teams are working in. Generally, an organization where IT is considered an enabler for the business and not the business itself. Examples: banking, utilities, insurance…

In an organization where IT is the business itself there is little or no good reason to do estimation. Check out #noestimates on twitter for more information (or Xing there case study or…). I’m a fan and I do consider context, push the boundaries, but for some there still seems to be a need for estimates. One could still argue the need and provide alternative solutions to the “why” behind the need for an estimate. But in large scale settings you might want to accept that you cannot change the entire organization at once.

So to clarify, we’re talking here about a situation where the Scrum team will need to provide estimates in order to continue.

The problem with estimating development efforts

Oxford researchers Alexander Budzier and Bent Flyvbjerg published an Academic Paper that investigated some hypotheses related to IT project failures, including the impact of choosing an Agile approach. They found that Agile methods decrease the average schedule overrun significantly, yet neither cost nor schedule fat tails were reduced. In over 4000 surveyed IT projects, they found that 12.8% of projects were twice as expensive as expected, and 12.6% were more than 90% behind schedule. These fat tails cannot be ignored!

We need to get better at estimation (at least where it matters, see context given at the beginning of this article) even though we understand that the human brain’s forecasting capabilities are limited.

Why is estimating that difficult?

Many cognitive biases are entering the field of estimation that lead to systematic errors of judgment. These biases tend to be overly optimistic rather than pessimistic regarding the amount of work required to generate a certain value (see Kahneman later on in this article).

The use of “story points” sometimes does little more than foster overconfidence which, in reality, is merely an illusion. How can a CxO, software manager or developer take estimation metrics seriously if they are based on intuitive predictions?!

Scrum does not provide a single way to estimate efforts. Scrum tells us that product backlog items should have an estimate next to a description, order and value. Nothing more, nothing less.

So, let’s see what’s out there and how we can use it to fix bad estimations and forecasting.

More details

One strategy applied to overcome the estimation challenge relies on detailing out the part to be estimated and sum up towards an overall number. We know this from the phase-gate model where analysts spend hours, days, weeks, or even months and years to detail the needs and specifications for the solution, assuming that with these details at hand we can estimate better.

Similar strategies are being using used in Scrum teams, where teams are making stories as small as possible to reduce complexity and variability in effort needs. Some might even take an approach to ask how many stories fit in a larger piece.

The problem with detailing is that you can never come to a description that is equal to another, no matter how small they are. They all have varying degrees of complexity, dependencies and unpredictable events affecting the actual effort required. Next to that, detailing takes a lot of effort (and thus cost) while ignoring the future learning – a well-known Lean software development waste.

What does it mean for the Scrum team?

To avoid the immense cost of going into details, for the purpose of forecasting, we refine bigger items into more detail in a Just in Time (JIT) way and continuously consider the learning the Scrum team had (Empiricism).

There is a rule of thumb to keep the refinement of value needs to about 10% of the Sprint, or about 1 day for a 2-week Sprint. We see lots of Scrum teams out there that merely use 5% to 2% of the Sprint, ½ day or less per 2-week Sprint. In a Large-Scale Scrum setting as we mentioned earlier this not good enough and it’s probably a sign of pressure being put on the system. This kind of pressure on the system will lead you to a product built on misunderstanding and lots of re-work to fix that.

So use the 10% refinement time wisely, it is not only about creating a backlog item but also about the architecture, the testability, the maintainability… but that’s maybe for another post. Let’s stick to estimating & forecasting now.

The Delphi Method

The Delphi method was developed at the beginning of the Cold War to forecast the impact of technology on warfare. Different forecasting strategies were tried but shortcomings of methods like quantitative models or trend extrapolation quickly became apparent. To combat these shortcomings the Delphi method was developed by Olaf Helmer, Norman Dalkey and Nicholas Rescher.

The Delphi method is neither difficult, nor complex, but simply asking experts to give their opinion on the probability, frequency and intensity of possible enemy attacks. Other experts could provide feedback and this process was repeated several times until a consensus emerged.

In the Agile Community, we see a similar approach using Planning Poker, where the team doing the work (= experts) is asked to give their opinion personally and feedback is gathered when and after showing the numbers. From there a consensus can emerge.

But no matter how large the team is, this approach still relies on intuition and is prone to biases such as group think. Next to that the team (= experts) might end up in endless discussions based on assumptions of solutions, endless discussions on certain details of the solution (getting us back to the more details help chapter above) and as such the cost of estimating increases while the level of engagement is decreasing.

What does it mean for the Scrum team?

The Delphi method is giving us the theory and background to conclude that we need the experts to estimate. This doesn’t always mean the whole team; it means what it means: the experts.

This way you avoid averaging out extremes because of the unaware, less experienced opinions in the group and have less of a risk to fall into the group think trap. Shocking, but not everybody in the team needs to be an expert on everything the team does.

Reference Class Forecasting

In 1979, Daniel Kahnemann and Amos Tversky published a paper called “Intuitive Prediction: Biases and Corrective Procedures”. They observed that human judgement is generally optimistic due to overconfidence and insufficient consideration of distributional information about outcomes.

What does that mean? It means that people tend to underestimate costs, time and risk while overestimating benefits of the same actions caused by people taking an “inside view” where focus is on the specific planned action, instead of the actual outcome of similar actions that already have been completed.

Kahnemann and Tversky suggest to take an “outside view” using distributional information from previous actions like the one being estimated and called it “Reference Class Forecasting”; from which the theory helped Khanemann win the Nobel Prize in Economics in 2002. Here you compare the new action to be done with similar actions done previously and use that data for forecasting purposes.

The misuse of the above is that one might compare on different levels of details (epics with stories, stories with tasks…). You need to compare likes with likes.

What does this mean for our Scrum teams?

Next to having multiple experts provide an estimate (Delphi) we need to make sure that we take the past into consideration. We need to build reference classes which we then can use in collaborative planning sessions with the experts.

Some might argue that you need to build reference classes with attributes attached to all items. It is claimed that doing so will further increase predictability and I’m pretty sure it does. However, adding attributes like dependencies, complexity… to the item with the purpose of estimating would not justify the TCO (Total Cost of Ownership) of using the reference catalog to my experience.

Scrum teams need a database with comparable cases (backlog items) that is big enough to generate large sample sizes and sufficiently narrow to be relevant to the experts estimating the item. It is important that the reference cases are helping the experts in providing feedback or challenge each other on a given estimate.

How do you build the capability of better estimates?

·      Step 1 is to build a reference catalog of estimates.

·      Step 2 is to have experts use that reference catalog for an estimate & forecast

·      Step 3 is to keep the reference catalog up-to-date

Step 1: Build a reference catalog


–       Backlog of items that need to be processed

–       Multiple experts that know their job


Give the items to the experts and let them create a reference catalog by simply comparing the items to each other and deciding if he or she would call it a Small, Medium or Large. Watch out! You cannot have experts do this collaboratively, remember the Delphi method? We need feedback without influencing.

You avoid this by giving the items to each expert individually and let them work privately on sorting the items into 3 columns: L, M, S. Once that is done you bring them together and compare. The items that are estimated the same are ok, the ones that are different require a feedback moment an re-estimate.

Once this is done you take the column “small” and repeat the exercise but now you provide columns: 1, 2, 3 (optionally you might drag in 5). After that you take the column “medium” and continue the exercise but now you add columns 5, 8, 21 (optionally you might drag in 50) to the sequence. After that you take column “large” and continue the exercise adding columns 50, 100, 200, 300, 500 and “Not Now”. With the “Not Now” you continue adding columns using the sequence 1,2,3,5,8,12,21,50 with enough zeros attached. In above example, you would replace “Not Now” first with 800 and ask if that’s ok for all. If not, you add 1200 and ask to distribute and so on until you have a starters database with reference cases.

From that moment on you start working with those numbers, as soon as items are “done” you have historical data to be used which we call the “Reference Catalog”. Take in mind that you need to be able to compare likes with likes (Kahnemann), so keep track of bigger pieces going into done as well even though you refine them along the way.

Never change an estimate given to an item while discovering more information or that item become irrelevant for the reference catalog as you would not compare likes with likes anymore; you acquired more insights and as such it is a different level of “like”.

Step 2: Use the reference catalog for an estimate & forecast


–       New items that need to be estimated

–       Reference catalog of items

–       Multiple experts that know their job


Basically we are talking about organizing a Planning poker session with reference classes instead of planning poker cards.

You make sure the reference classes are very visible to the Scrum team members when going in an estimation session. A new item that needs an estimate is brought into the group and the Scrum team decides who the experts are for this (Delphi). This is not done by consensus or any other decision process, simply by volunteering (I will estimate, hands raised…) and extended by being nominated to estimate (I want you to estimate as well). No discussion allowed, if you think you are an expert then you are, if somebody else think you are an expert then you are.

All experts physically move a copy of the item to be estimated in their personal reference catalog which is not visible to others. Both aspects are crucial: move a copy through the reference catalog, truly comparing the item with cases in the catalog + invisible to others so that we avoid bias and truly get feedback on estimates.

When all experts have their estimate ready they reveal them to each other where the feedback session starts but not as “that is not a 12 but an 8”. Experts should provide feedback as “I considered this an 8 as it relates more to (reference case in 8 catalog) because…”. Feedback sessions are collaborative, in group and end by having the items added to the reference catalog.

Once you have a complete Scrum Product Backlog estimated it is easy to create a forecast. See formulas in this handy excel sheet to manage to do’s.

Step 3: Keep the reference catalog up-to-date


–       Reference catalog of items

–       Multiple experts that know their job


Considering the limitations of the human memory, we can just about automate this step. When a certain class (= number in the sequence) has more than 20 entries, you simply ask the experts to remove the least relevant ones. It is important that experts do this themselves so that they keep the information that is relevant to them and helps them provide context and information on the items to estimate. Remember Delphi? Experts need to be able to provide feedback. If you would completely automate the process of cleanup you might end up with cases that are irrelevant to the experts and end up with lousy estimates again. Even worse, you might end up with experts providing feedback as a means of estimating based on false assumptions due to the information gap.


Agile is what Agile is: it is not a process, it is not governance, it is a mindset! Four Values, Twelve Principles. As such there is no Agile Estimation or Forecasting. There is Estimation and Forecasting with an Agile mindset, using collaboration between experts > reference cases based on working software > do not go into details so you can respond to change.

Scrum is what Scrum is, it is not a process, it is not governance, it is a framework for building and sustaining complex products. The Scrum framework helps you to get to a forecast and improve. The Scrum Team (or experts) are involved in defining an estimate on Product Backlog items. Therefore, having the Product Backlog ordered, we end up with a Forecast which is continuously refined, re-ordered according the learning achieved.

How do we fix Scrum teams that are terrible at forecasting? Look at historic successes, learn from them and improve. Empirical process control is a basic principle within Scrum. With that in mind, I’m sure that in the future we’ll find even better ways to estimate and forecast. Practices with a lower TCO, higher predictability, and a lot more fun…

That’s it folks.

There was nothing new in this article but I hope you learned something anyway. I’m looking forward to hear how you address estimating and forecasting in this fast changing world.

There is more learning awaiting you at

>> Sidenote: experiment with some of the #noestimate alternative solutions to the problems you address today with estimating and forecasting.