Evidence Viewpoint: Compared to What? Why Comparison Groups Matter

Category Voices

Published April 28, 2026

Author Amanda Neitzel, Center for Research and Reform in Education

Is a 25-minute 5K time good?

It depends. Compared to an elite runner, it’s slow. Compared to the average adult, it’s quite fast. Compared to your own previous time, it might represent real progress.

We intuitively understand this in everyday life. But in education research, we often forget it.

When people ask whether a tutoring program or curriculum “works,” what they often mean is: Is the effect size big enough? Is 0.10 meaningful? Is 0.20 strong?

But that’s the wrong first question.

The better question is: Compared to what?

What an Effect Size Actually Tells Us

In any evaluation—whether a randomized controlled trial or a quasi-experimental study—the estimated impact reflects a difference between two groups:

Students who received the program
Students who did not—the control group in randomized studies, or more generally, the comparison group

That sounds straightforward. But in practice, what the comparison group experiences can vary enormously and that variation fundamentally shapes the results.

In other words, effect sizes are not just about how well something works, they are about how much better it works than the alternative.

Tutoring: Not One Comparison, but Many

Consider tutoring, one of the most studied and widely implemented strategies in education today.

A study of tutoring might compare:

Tutoring vs. no additional academic support
Tutoring vs. another tutoring program
Tutoring vs. other supplemental services (e.g., small groups, intervention blocks)
Tutoring that replaces core instruction vs. tutoring that adds to it

These are not small differences. They are entirely different questions.

If tutoring is compared to no additional support, we might expect relatively large effects. If it is compared to another structured intervention, effects will likely be smaller. If tutoring replaces core instruction, pulling students out of reading or math blocks, the net benefit may be reduced or even negligible.

Yet these distinctions are often collapsed into a single headline: “Tutoring produced an effect size of 0.20.”

Without context, that number is almost meaningless.

A tutoring program showing an effect size of 0.20 against no additional support may be less impressive than a program showing 0.10 against another high-quality tutoring model. But unless we understand the comparison, we can’t interpret the result.

Curriculum Studies: The “Business as Usual” Problem

The same issue arises in evaluations of curriculum and instructional programs.

We are never comparing a reading curriculum to “no reading instruction.” Instead, studies typically compare:

Curriculum A vs. Curriculum B
A new program vs. “business as usual”

But “business as usual” is rarely well-defined. It might include:

A different structured curriculum
Teacher-developed materials
A mix of approaches that vary across classrooms or schools

In reality, many curricula share common features. They align to similar standards, include overlapping instructional practices, and are implemented under similar constraints.

As a result, even meaningful improvements may produce modest effect sizes.

This is not a sign that “nothing works.” It is a reflection of the fact that we are comparing one reasonable approach to another, not replacing something with nothing.

Why “It Depends” Is the Right Answer

When someone asks, “Is an effect size of 0.10 good?” the honest answer is:

It depends.

And not just on one thing.

It depends on:

What the program is being compared to
The strength of the research design (e.g., randomized vs. quasi-experimental)
What students in the comparison group actually experienced
Whether the program adds to or replaces existing instruction
The outcome being measured and the time frame

Among these, the comparison condition is often the least visible, but one of the most important.

A small effect against a strong comparison can be more impressive than a larger effect against a weak one. But without understanding the comparison, it is difficult to know which is which.

What This Means for Evidence Users

For practitioners and policymakers, this leads to a simple but powerful habit:

Always ask: What is this being compared to?

Before interpreting results, look for:

A clear description of the comparison condition
Whether students received alternative supports or services
Whether the program replaced or added to existing instruction

These details are sometimes buried in technical sections—or missing altogether—but they are essential for making sense of the findings.

Developing the habit of asking this question is one of the easiest ways to become a more sophisticated consumer of evidence.

What This Means for Researchers

Researchers can strengthen the field by improving how we describe comparison conditions.

This is not easy. “Business as usual” is inherently messy. It varies across schools, classrooms, and even individual teachers.

But even partial transparency helps:

Describing typical instruction in comparison classrooms
Reporting access to supplemental supports
Clarifying whether interventions replace or supplement core instruction

Better descriptions of comparison conditions make findings more interpretable—and more useful for decision-making.

Conclusion

Going back to the 5K analogy: a time only makes sense in context.

Education research is no different.

Effect sizes do not tell us how well something works in absolute terms. They tell us how much better—or worse—it performs relative to an alternative.

And until we understand that alternative, we do not really understand the evidence at all.

So the next time you see an effect size, resist the urge to ask whether it is “good.”

Instead, start with a better question:

Compared to what?

A picture of a half of a lemon with a half of a grapefruit

Back to News

News: Evidence Viewpoint: Compared to What? Why Comparison Groups Matter

What an Effect Size Actually Tells Us

Tutoring: Not One Comparison, but Many

Curriculum Studies: The “Business as Usual” Problem

Why “It Depends” Is the Right Answer

What This Means for Evidence Users

What This Means for Researchers

Conclusion

Social

Address

Education Matters

Site Menu

Share Options

What an Effect Size Actually Tells Us

Tutoring: Not One Comparison, but Many

Curriculum Studies: The “Business as Usual” Problem

Why “It Depends” Is the Right Answer

What This Means for Evidence Users

What This Means for Researchers

Conclusion

Address

Site Menu