When the Theory Was a Compromise: Rethinking Theory-Based Evaluation for Frameworks Forged in Political Processes

Imagine you are asked to evaluate a framework — a regional cooperation model, an international standard, a multilateral protocol — and your first task, as any good evaluator would do, is to reconstruct the theory that underpins it. You interview stakeholders. You review the founding documents. You trace the programme logic. And what you find is not a theory. What you find is a negotiation that never ended.

The language is purposefully vague. Key provisions carry multiple interpretations, each defended by a different constituency. The outcomes were left undefined not because no one thought about them, but because getting agreement on definitions would have broken the consensus needed to adopt the framework at all. Different member states signed on for different reasons. Some sought binding standards; others wanted a soft reference point with no enforcement teeth; a third group simply needed the political optics of participation.

This is not a design failure. This is how normative frameworks work.

And yet, Theory-Based Evaluation (TBE) — arguably the dominant paradigm for serious programme and policy evaluation — was built on a different assumption: that there is a theory to surface, test, and refine. When that assumption breaks down, so does a significant part of our evaluative toolkit. This article argues that TBE can still add immense value in these contexts, but only if we consciously adapt it around three pivots: from attribution to pathways of influence, from intervention mechanics to legitimacy as a mechanism, and from outcome measurement to institutional uptake.

The Standard Logic of TBE — And Why It Struggles Here

Theory-Based Evaluation, developed most influentially through the work of Carol Weiss, Huey Chen, and later John Mayne's contribution analysis framework, proceeds from a deceptively simple premise: every intervention rests on a theory — explicit or implicit — of how change happens. The evaluator's job is to surface that theory, trace it, and assess whether the causal logic holds.

This works beautifully when the evaluand was designed with intentionality. When someone sat in a room, drew a results chain, and decided that if we train 200 health workers, then community health indicators will improve within 18 months, there is a testable hypothesis. You can examine whether the training happened, whether it changed practice, whether practice influenced outcomes. You can attribute, contribute, explain.

The problem is that most normative frameworks at the international or multilateral level were not designed. They were negotiated. The difference is not semantic.

In design, you optimize for effectiveness. In negotiation, you optimize for consensus. These are fundamentally different objective functions. The resulting frameworks often contain:

Purposeful ambiguity — language flexible enough to let parties with incompatible positions both claim victory
Parallel theories — each signatory holding a different mental model of what the framework is supposed to do and how
Satisficed rather than calibrated targets — benchmarks set at the level of what was politically achievable, not what evidence suggested was needed
Deferred operationalization — the "how" was kicked down the road to implementing bodies, future sessions, or member state discretion

Applying a standard TBE logic to this kind of evaluand — looking for the causal theory and testing it — is epistemologically misaligned. You are not testing a hypothesis. You are reconstructing a negotiation and then asking whether the ghost of that negotiation produced change in the world.

Pivot One: From Attribution to Pathways of Influence

The first adaptation is conceptual but consequential. Rather than asking did this framework cause that outcome?, we should ask through what pathways did this framework shape behaviour, decisions, and institutional practice?

The distinction matters because normative frameworks do not produce effects the way a direct service intervention does. They do not deliver widgets. They shape the landscape in which actors make decisions. They function as:

Reference points — giving legitimacy to claims that would otherwise be contested
Accountability anchors — enabling civil society, oversight bodies, or member states to demand performance against agreed standards
Norm carriers — introducing language and categories that gradually enter the operating vocabulary of institutions
Coordination devices — making it easier for actors with overlapping interests to act in concert without explicit agreement each time

None of these pathways is linear. All of them are contextually contingent. Some will be dominant in some member states and absent in others. The evaluator's task is to map the active pathways in specific contexts, assess the conditions that enable or block each, and build an evidenced account of how (and where, and for whom) influence is flowing — without pretending that a single causal chain connects the framework to any measurable end-state.

Process tracing, contribution analysis adapted for multi-actor settings, and realist evaluation logic (mechanisms × context = outcomes) are all useful here. What does not help is a log frame with a straight vertical arrow running from the framework text to a population-level indicator.

Pivot Two: Legitimacy Is Not Background Noise — It Is the Mechanism

In standard TBE, legitimacy is often treated as a contextual factor: the intervention is more or less likely to work depending on how legitimate stakeholders consider it to be. In normative frameworks, legitimacy is not the context. It is the active ingredient.

Frameworks forged through diplomatic processes function almost entirely through legitimacy dynamics. They lack the coercive force of legislation or the incentive power of conditioned funding. What they have is the authority that comes from consensual adoption, procedural credibility, and alignment with recognized values. When an institution adjusts its practices because a framework says it should, it is not responding to an enforcement mechanism. It is responding to the normative pull of something it has accepted as legitimate.

Thomas Franck called this "compliance pull" — the inherent capacity of a rule perceived as legitimate to draw actors toward compliance without external compulsion. Martha Finnemore and Kathryn Sikkink mapped how international norms cascade through the international system once they achieve a tipping point of acceptance. Both are pointing to the same phenomenon: in normative work, legitimacy is causal.

This means evaluation must assess legitimacy — not as a satisfaction metric but as an explanatory variable and an outcome in its own right. Was the framework adopted through a process that key actors regard as procedurally fair? Does its substantive content align with the values constituencies it addresses? Is it performing well enough to sustain confidence among those expected to implement it? These are evaluative questions, not background conditions.

An evaluand with low legitimacy among the very actors it was designed to mobilize is not a functioning framework, regardless of what the mandate document says. And an evaluand with high legitimacy may be generating substantial normative influence even in the absence of measurable outputs.

Pivot Three: Institutional Uptake as the Evaluable Outcome

The third pivot addresses the question of what, precisely, we should be measuring.

One of the recurring frustrations in evaluating normative frameworks is the gap between the level at which the framework operates (typically, language, standards, policy orientation) and the level at which impact is ultimately experienced (communities, individuals, resource allocations). That gap is often too wide, too complex, and too confounded to close through any evaluation design with a reasonable scope and budget.

The evaluable space lies in between — in institutional uptake.

Institutional uptake refers to the processes through which relevant actors have internalized the framework, translated it into operational terms, and embedded it into their structures, procedures, and practices. It asks not whether the framework has changed the world but whether the world's institutions have changed in relation to the framework.

This is tractable. It can be investigated through document review, institutional mapping, stakeholder engagement, and comparative case analysis. It yields evaluative findings with genuine utility for the governance and oversight of the framework, and it is honest about the limits of causal inference across complex, multi-actor, multi-context systems.

Key evaluation questions in this register include: Which categories of actors have taken up the framework, and which have not — and what explains the difference? How have implementing institutions translated framework intent into operational guidance? Where translation has occurred, has the spirit or only the letter of the framework been absorbed? What organisational, political, or capacity conditions have enabled or impeded uptake?

These questions do not tell us whether the framework is solving the problem it was created to address. But they tell us something arguably more actionable for those responsible for its stewardship: whether the framework is gaining institutional traction, and where the leverage points for strengthening that traction lie.

Practical Implications for Evaluators

Working in this adapted TBE mode requires some discipline:

Reconstruct the plural theories, not the singular one. Use stakeholder interviews and document analysis to surface the different theories of change that different constituencies brought to the negotiating table. Present them as a landscape of intent, not a consensus model. This is itself an important finding.

Be epistemologically honest with commissioners. Managing the expectations of evaluation commissioners who want causal attribution — especially in contexts where governing bodies expect evidence of "what works" — is part of the evaluator's professional responsibility. That conversation is better had at design stage than at reporting stage.

Invest in comparative contextual analysis. Because uptake varies by context, the comparative dimension is analytically rich. What enables uptake in one setting is often the variable that blocks it in another. Those contrasts carry real learning value.

Let legitimacy be visible in findings. If your data indicate that the framework is perceived as procedurally flawed, substantively contested, or strategically irrelevant by key constituencies, say so clearly. Legitimacy deficits are among the most important findings an evaluation of a normative framework can produce.

The Evaluator as Institutional Anthropologist

Working at the intersection of diplomacy, governance, and evaluation, we sometimes need to inhabit a role closer to an institutional anthropologist or a diplomatic historian than a programme evaluator. We are reconstructing meaning, tracing influence, mapping uptake, and assessing legitimacy — not testing a hypothesis that was never actually formulated.

Theory-Based Evaluation remains the right framework. Its insistence on making causal logic explicit, its demand that evaluators understand how change is supposed to happen before asking whether it did, its orientation toward learning rather than mere accountability — all of this is as valuable in normative settings as anywhere. What we must resist is importing the form of TBE — the linear results chain, the attribution logic, the single theory of change — into contexts where the evaluand was built from shared intentions and diplomatic language rather than from evidence and design science.

Frameworks born of compromise deserve evaluation methods built for complexity. When we adapt our tools to match the nature of what we are evaluating, we produce findings that are more honest, more useful, and more credible — to the institutions that commissioned the work, and to the profession we represent.

Search This Blog

Monitoring, Operations Research and Evaluation for Adaptive Programmes (MORE-AP)