Category Voices
Author Amanda Neitzel, PhD

Something I share with my husband is a love of The Great British Bake Off. If you aren’t familiar with this show, it is a baking competition, where each week contestants have to bake three things, someone is named star baker, and someone is sent home. Given that this is a competition, there are two judges. Our favorite is watching Paul Hollywood carefully examine each and every bake. He is always looking for what went wrong and he is tough. He cuts right in the middle of a loaf of bread, he pokes slices of cake, he flips pastries right over to see what the bottom looks like. Breads are often under-proofed or crème pat (crème patissière) will be lumpy. My husband and I have taken to jokingly doing this to each other’s baked goods. We tap the bottom of a just baked loaf of bread to listen for the hollow sound, while channeling our inner Paul Hollywood.

This type of judging is relevant in educational research. There are always people proclaiming that a new study showed that program X helps students learn better. However, we need to channel our inner Paul Hollywood though and dig into that study, to see if it really is as good as it looks. Paul would never just accept that a loaf of bread was good simply because someone said it tasted great, and we shouldn’t just accept a splashy research study either. Bob Slavin (2007) talked about this as accessing your inner Gremlin, a grumpy creature named Norm, who is always looking for ways that studies are flawed. This is something that everyone can do with a little practice. Just like you look for a few key things in a loaf of bread (crust, texture, flavor), education research also has a few key factors that help you decide if this is a blue ribbon study or one that you should be more cautious accepting. These factors can be grouped into three main questions, What? Who? and How?

What are you comparing?

First you need to understand what the “thing” is that is being researched. This might be a particular reading textbook, a math tutoring program, or a whole-school SEL intervention. It could be a leadership training for principals or even a reorganization in the school schedule. But you need to have the program or product clearly defined and know whether it is being implemented under fair conditions – with the types of resources and supports normally available to schools. If not, that’s a hint that this “bake” is flawed.

Once you know what you are testing, then you need to understand what it is being compared to. Researchers call this the counterfactual or control condition, which is often described as business as usual or what schools typically do. How different is that typical practice from what you are testing? Are you comparing a new tutoring model with an existing tutoring model? Or are you comparing a new tutoring model with no tutoring at all? Knowing the comparison condition will help you understand how big of an impact to expect – results would be very different if you were comparing two tutoring approaches than if you were comparing tutoring and no tutoring.

Who are in the groups?

Next you need to understand your participants. Researchers spend a lot of time talking about sample selection, but it really just means knowing who is included and who is not. The biggest clue that something is wrong with the “bake” is if the people in the treatment group (those getting the “thing” you are testing) and the comparison group are very different. So look at the comparisons of the groups at baseline (before the “thing” started). Are they very similar? If not, this “bake” isn’t quite right. You should also understand who is being excluded from the sample. It is quite common in some studies to remove students who didn’t get enough of the treatment, for example when students participating in a supplementary education technology program who didn’t complete enough lessons are removed. That study tells you how well it works for someone who used it enough, but not how it works on average for all students. When you see this, your inner Paul Hollywood should be very concerned.

How is success being measured?

Finally you need to ask is how the outcomes being measured. Are they using a fair assessment, like a standardized test, or are they using a test that is aligned to the treatment, which is biased toward the treatment group? For example, perhaps there is a study testing a particular vocabulary program that is meant to improve reading. The outcome measure is a test of 40 of the words taught in the program. That lets you know whether those students learned those particular words better than students who didn’t participate in the program, but it doesn’t tell you whether that program actually helps students learn to read. Using these types of unfair measures, often designed by the people who made the program or intervention, is a signal that the “bake” is off.

Even better news is that there are resources that can help you with your judging. Clearinghouses such as the What Works Clearinghouse and Evidence for ESSA can help by identifying programs and studies that have good bakes. So, the next time you hear about a new research study that makes big claims about what helps students, tap into your Paul Hollywood. Don’t be afraid to cut that study apart and really look to see if it passes muster. We shouldn’t be making educational policy decisions based on studies with soggy bottoms or under-baked loaves. Instead look for those studies that are perfect bakes.


Slavin, R. E. (2007). Educational Research in an Age of Accountability. United States: Pearson.

Keep up with our latest news.