Story pointing can feel a little wishy-washy, especially on teams that are new to scrum. While the inherent ambiguity to point estimation belies its power, folks need some place to hang their hat in grasping this concept.
Time estimation feels so much safer to people because of its universality. Time zones be damned, but an hour in New York City is the same span of time as an hour in Paris, Melbourne, or Bangkok. Even though hours, minutes, and seconds are just human constructs, since they are an everyday shared experience, guessing the time it will take to do something feels more precise than using the Fibonacci sequence. While we have mostly overcome the variance of a mechanically-slow timepiece, if you’ve ever felt the panic on the morning of daylight saving when your car’s clock is an hour off, you know how tenuous a concept time truly is.
Story pointing denies the placating certitude of time and embraces the inexactness of estimation with open arms. With Fibonacci estimation, the difference between a 2 and a 3 isn’t math. It’s a subjective judgment call made by each member of the team. And just like how hours work globally because we all take our cues from the World Clock, teams need to base their decisions using the same benchmarks to unlock the true value of story points and team velocity. This can be accomplished by using reference stories.
Reference stories are documented work samples that provide Fibonacci guidance for a team or organization. Note that last part: for a team or organization. Unlike other concepts, you can’t Google for a universally applicable 5. Instead, it is a shared and adhered to estimate representation for a group.
Reference stories can be localized to a team based on past work. A team can pluck a resolved task from a past sprint and decree “This is an ideal 3!” Or, in other cases, reference stories can be purely hypothetical tasks, couched in a team’s industry, readily understood by everyone to forego the need for context.
I keep using the plural “reference stories.” Quite often, when level-setting estimation via references, teams feel they need examples for all Fibonacci numbers. In reality, a single, objectively clear reference story will achieve the same end as a whole library of references: consistent pointing across all team or organizational members. It’s not a matching game—it’s about relative association.
What’s more, if a reference story mentions anything numeric, teams may bust out their arithmetic skills and turn the whole thing into a word problem. “Ok so our 13 reference involves a six-hour meeting involving three people with two hours of prep work and this task is a three-hour meeting for four people, so that makes it a...carry the 1...except after c...“
Reference stories are not about math (at least, not solely). Two 5s don’t make a 10, so doing some kind of algebraic equation won’t work. Rather, teams need to analyze their unpointed story and compare it against their reference story in the following ways (in no particular order...though the last one is least useful):
Teams need to view the new task through the lens of their reference story. If the reference story is an 8 and the new task is is a little more risky, consider pointing it one step up the Fibonacci scale (which would make it a 13). If the team as a whole has gained enough skills so the task will take a lot less effort to complete, maybe step down two deviations and go with a 3.
“But what if the complexity is the same, the effort is more, and the risk is less?” Well, like I said, this isn’t math. There’s no exact right answer. The team members each need to point what their guts tell them to and see if there is enough agreement (less than two standard deviations across all votes) to unequivocally lock in a point value.
As time passes, teams might outgrow their reference stories by learning new skills or developing better tools to get the job done. Does that mean that their reference stories are now worthless?
Consider the concept of par in golf. Par (or Professional Average Result - yay Google!) is a number from 3-5 (generally) that is applied to each hole on a golf course—the higher the number, the more strokes it will likely take golfers to finish.
Just like an overly risky user story, a par-four may treat players more like a par-five on days that are windy or that are early/late in the golf season. Heck, difficulty also increases if you slept like crap the night before. It’s all situational, same as with pointing. Reference stories set the standard for a set context, then teams decide on how new tasks deviate from that concrete example.
Well then, does it mean that the reference stories should be regularly repointed?
Maybe. But maybe not.
Teams increase their velocity by becoming more efficient. Some of that efficiency is tied to completing the same complex task with less effort and risk. To go back our golf analogy, a par-four is always a par-four. Just because you get better at golf or buy a more advanced sand wedge, golf courses aren't going to change their par-fours into a par-threes. Scrum is a team sport. Even if a single team member improves significantly, user stories may not end up with lower point values. However, if the entire team feels that the present day is substantially different from when the reference story was originally pointed, a reevaluation may be in order.
In the end, teams should estimate tasks based on the present state of things, with each individual making a judgment call through the criteria of complexity, effort, and risk. Reference stories can help guide team members to suggest a point value that makes sense to them.
No guarantee that reference stories will help you avoid a contentious sand trap now and again, but hopefully, you stay out of the rough more often than not.