Let’s Be Reasonable: Teaching AI When to Think Harder

Saturday, June 21, 2025

In recent years, artificial intelligence (AI) has become a decision-maker in scenarios that impact real people, from moderating online speech to determining who gets approved for a loan. These decisions often involve competing interests: one person’s right to privacy versus another’s safety, or one customer’s need for speed versus another’s fair treatment. While these systems are expected to be objective, fair, and aligned with human values, the reality is more complicated.

The research paper (from Google, Stanford, MIT, and Harvard) on Resource Rational Contractualism (RRC) zooms in on this very problem. The authors argue that one of the most urgent challenges in AI today isn’t just making decisions, but making decisions fairly when there are conflicting stakeholder interests (and doing so under real-world constraints like limited time, information, or computational resources).

Imagine a lending platform approving thousands of loans a day. An ideal decision-making system would consider each applicant’s context, debate what fairness looks like for their situation, and ensure all stakeholders (lenders, borrowers, regulators) would reasonably accept the outcome. In theory, that’s the gold standard. But in practice, no system has the time or capacity to simulate this kind of deliberation for every single decision.

Today’s AI systems typically follow one of two extremes: they either apply rigid rules (“reject all applicants below a certain credit score”) or try to mimic ideal deliberation by pouring resources into deep, nuanced reasoning for every case. The former is efficient but brittle, quick to misjudge edge cases or produce unfair outcomes. The latter is idealistic but expensive, slow, and often impractical at scale.

So the question is: How do we design AI systems that can make decisions stakeholders would endorse (without pretending we have infinite resources to reason through every one)?

To tackle this, the authors introduce a novel framework, RRC. The name may sound academic, but its basic idea is surprisingly intuitive.

Start with the concept of contractualism, a philosophy that says a decision is fair if no one affected by it could reasonably reject it. Think of it like a thought experiment where everyone impacted by a decision gets a say, and the AI chooses the option they could all agree on under ideal, honest conditions.

RRC adds a practical twist: in real life, we don’t have the luxury of “ideal” conditions. So instead of aiming for perfection every time, RRC helps an AI decide how much reasoning effort to invest based on the complexity of the situation and the resources available. In other words, the AI asks itself: Is this decision straightforward enough to apply a known rule, or do I need to simulate a negotiation among stakeholders to figure out what’s fair here?

This approach mirrors how humans often make decisions. For simple calls (like whether to yield at a stop sign) we follow rules. For tricky trade-offs (like whether to prioritize one client over another with overlapping needs), we think harder, weigh values, and sometimes even consult others. RRC formalizes this into a strategy for AI: use the simplest possible method that still gets a fair answer.

The core of RRC is its toolbox of decision-making strategies, each with different costs and benefits. The AI picks from this toolbox dynamically, guided by how “hard” the decision is and how much it’s worth investing to get it right. This allows the system to act efficiently where possible and deliberatively where necessary, an approach that’s more scalable and aligned with human expectations of fairness.

To understand whether RRC could actually improve AI decision-making in practice, the researchers tested the framework using language models (the same kind of models that power conversational AI tools and decision-support assistants). Their goal was to see if RRC could help these models make better moral and social decisions across a wide range of scenarios, especially when different people might have different stakes or perspectives.

The team designed two challenge sets of decision problems. These weren’t just abstract puzzles; they were intentionally built to simulate realistic moral dilemmas that AI might encounter in fields like healthcare, criminal justice, or digital content moderation. In some cases, the decision was fairly obvious; in others, getting it “right” meant balancing nuanced trade-offs that wouldn’t be captured by simple rules.

Here’s how the researchers set up the experiment: they gave the AI model different ways to reason through each case. One approach used minimal guidance (essentially letting the model answer however it wanted, with no special prompts). Another approach instructed the model to think strictly in terms of rules or heuristics (for example, “do not cause harm” or “follow the law”). A third approach simulated a form of bargaining among stakeholders, where the AI tried to consider what different parties might reasonably want or reject in each situation.

But the real innovation came with the RRC approach. Here, the model was prompted to first evaluate how hard the case was, and then select the most appropriate reasoning strategy: use a rule if the case was easy, simulate bargaining if the case was complex, or choose a hybrid approach if needed. This “adaptive reasoning” is what set RRC apart.

What did the researchers find?

The RRC-guided models consistently made better decisions, not just in terms of outcomes that matched human moral intuitions, but also in terms of computational efficiency. In cases where a quick rule-based answer was good enough, the model saved time and resources. But when a case demanded deeper thinking, the RRC approach nudged the model to invest that effort. This adaptability allowed the system to scale its decision-making gracefully—allocating its limited resources where they mattered most.

Importantly, the researchers weren’t just measuring whether the AI got the “right answer.” They also looked at how the model made its decision. Did it follow a process that mirrored how reasonable people might think through the problem? Did it avoid using more resources than necessary? Did it balance speed and fairness appropriately?

Success was evaluated by comparing the model’s decisions to a set of expert-generated labels, the kind of “gold standard” outcomes you’d expect if a team of thoughtful humans had debated each case under ideal conditions. The researchers also measured the computational cost of generating each decision—using output length as a proxy for resource use.

This dual focus on quality and cost reflects a central concern in the real world: it’s not just about making the best decision, but doing so in a way that’s affordable, scalable, and explainable. A system that always uses maximum resources might make excellent decisions, but it won’t be practical for companies managing millions of cases per day. Conversely, a system that’s cheap but brittle could cause real harm when it mishandles a high-stakes case.

What the experiments ultimately showed is that RRC isn’t just a philosophical idea; it’s also a practical strategy for balancing fairness and efficiency in AI decision-making. The system learns to reserve its most sophisticated reasoning tools for when they’re really needed, while still delivering acceptable performance on simpler problems. That’s a powerful upgrade over today’s one-size-fits-all AI logic.

The researchers didn’t stop at confirming that RRC-guided AI models could deliver smarter decisions; they also scrutinized how well the solution held up under pressure and where it might fall short in real-world applications. The overarching question was: Can this framework scale responsibly while still aligning with human expectations of fairness and reasoning?

To answer this, the team looked beyond simple accuracy. They asked whether the AI’s decisions aligned with human-reasoned decisions in situations where the stakes were high and disagreement was likely. In essence, they weren’t just interested in whether the AI got to the “right” answer, but whether it got there in a way that mirrored thoughtful human deliberation (what a reasonable group of stakeholders might conclude after discussion).

But success wasn’t measured solely by human agreement. The researchers also assessed efficiency (whether the AI spent just the right amount of effort, not too little and not too much), depending on the complexity of the decision. That meant measuring how much computational power was used (roughly tracked by how many tokens or words the model generated to get to an answer), and comparing that to the difficulty of the case. An ideal system, under the RRC philosophy, should reason quickly on easy problems and invest more resources on hard ones. This performance profile became a key benchmark.

However, the authors were candid about the limitations of their work. For one, the test scenarios were synthetic (that is, they were designed to simulate moral dilemmas but weren’t drawn from live, messy real-world data). That means while the framework worked well in controlled settings, its effectiveness in real-life applications (like content moderation or healthcare triage) still needs to be proven. Stakeholder dynamics in the wild are more complex, less predictable, and often come with incomplete or conflicting data.

Another limitation lies in the range of decision strategies tested. The RRC framework was demonstrated with just a few reasoning modes—basic rule-following, simulated stakeholder bargaining, and a hybrid strategy. But many other mechanisms could eventually be added, such as deliberative debate, public policy proxies, or culturally grounded norms. As it stands, the system is powerful but somewhat narrow in scope.

That’s where the future direction becomes exciting. The authors outline several clear paths for advancing the framework. First, it will be interesting to use richer datasets drawn from real-life decision-making contexts, including high-stakes environments like healthcare, criminal justice, and finance. Second, there are opportunities in integrating more nuanced reasoning tools, especially those that can learn from community input or simulate democratic deliberation. And third, the development of process-level supervision—training models (not just on outcomes, but also on the reasoning pathways used by humans) to arrive at those outcomes could lead to further advancements.

As for the broader impact, RRC offers something sorely lacking in many current AI systems: a structured way to make trade-offs between fairness, accuracy, speed, and cost. That matters enormously for organizations deploying AI at scale. Whether you’re approving mortgages, managing traffic flows, or assigning school placements, the stakes are real (and so are the resource constraints).

By teaching AI to reason like a bounded human (smart, but aware of its limits), RRC enables a form of decision-making that is not only more aligned with human values but also far more sustainable. It provides a roadmap for building AI systems that can be trusted not because they are perfect, but because they know when to think harder (and when not to).

Further Reading