Let Me Defer to Someone Smarter
How new AI training methods help models decide when to act and when to pass decisions to experts.
In a world increasingly shaped by automation, AI systems are being trusted to make critical decisions: whether it’s diagnosing a disease, approving a loan, or moderating online content. Yet as powerful as these systems are, they’re not infallible. Sometimes, the smarter choice is for the machine to not decide at all.
That’s the premise behind a concept called learning to defer, the idea that AI systems should be able to recognize when they’re likely to make a mistake and instead hand off the decision to a more reliable expert (whether that’s a human or a higher-powered system). It’s a deceptively simple idea, but one that’s surprisingly hard to implement in practice.
A recent AI research paper (from Google and NYU) takes this problem head-on. The research addresses a critical gap in the way AI systems are trained to defer to experts. Prior efforts often used training objectives that were mathematically convenient but didn’t align well with real-world performance. In other words, AI systems were being trained to optimize a stand-in metric (a surrogate) that didn’t actually guarantee better decisions or smarter deferrals. That’s a big deal, especially in high-stakes domains like healthcare, finance, and autonomous vehicles, where a bad deferral strategy can mean higher costs, worse outcomes, or even physical harm.
So, what exactly is the problem the researchers are solving? At its core, the paper is about how to teach AI systems when to decide and when to defer in a way that is provably effective (not just in theory, but also in real-world conditions). The system should minimize overall error while managing the cost of invoking an expert. This could be financial (paying for a radiologist’s time), operational (slowing down a user experience), or even reputational (erring on sensitive content).
To solve this, the researchers introduce a new family of training objectives, or surrogate losses, that are specifically designed to match the decision-making goals we actually care about. Think of these surrogate losses like a training curriculum: you want it to be smooth, measurable, and aligned with the real-world tasks your AI is expected to perform. The new surrogates proposed in this paper are not only easy to optimize using standard machine learning (ML) techniques, but they also come with mathematical guarantees: if the AI gets better at the surrogate, it will get better at the real task of smart deferrals.
The paper tackles two major real-world scenarios. In the first, the AI system and its deferral strategy are trained together from scratch (think of this as building an end-to-end autonomous agent). In the second, the AI system is already trained (maybe by another team or from a public dataset), and you only have the flexibility to build the deferral layer on top. This is often how things play out in corporate settings where models are pre-approved, but business units want to customize risk strategies.
In both settings, the researchers prove that their surrogates lead to smarter, more accurate deferrals under fairly broad and realistic conditions. They also provide different levels of performance guarantees—from basic “this won’t make things worse” assurances to stronger results that approach the best possible decision strategy over time.
In plain terms, this research shows us that it’s possible to build AI systems that know what they don’t know—and that we can train them in ways that are both rigorous and practical. By aligning the training goals with the business and operational realities of cost, risk, and accuracy, Mao et al. are laying the foundation for more trustworthy AI systems that can be confidently deployed in settings where the stakes are high and the margin for error is thin.
To understand whether their approach to smarter deferral actually works, the researchers behind this paper didn’t stop at theory—they put their methods to the test using a blend of synthetic setups and real-world classification tasks. Their goal wasn’t just to validate abstract concepts, but to see how well their proposed deferral strategies performed in practice compared to the status quo.
One of the first steps in their experimentation was to simulate a controlled environment where the behavior of different deferral strategies could be closely monitored. In this setup, they created simple data scenarios with known optimal outcomes—essentially test beds where you should know when the AI should act and when it should defer. This kind of clean experimental sandbox is useful because it lets researchers isolate what’s working and what’s not without confounding variables.
The takeaway from these simulations? Their new learning strategies succeeded where older methods fell short. Previous models often deferred too much or too little, simply because the optimization process they followed didn’t align tightly enough with the real objective of making smart, cost-effective decisions. The new surrogates proved more reliable in learning when to defer in a way that mirrors human-like judgment: act when confident, defer when uncertain.
But the researchers didn’t stop at controlled data. They moved on to more practical benchmarks—real-world datasets used for classification tasks, like categorizing images or interpreting customer data. These are closer analogs to the types of systems used in business, healthcare, or finance. Here, they layered in simulated “expert costs” to mimic real-world decision trade-offs: for instance, the idea that sending a case to a human expert takes time, money, or labor. This allowed them to evaluate how well their AI systems could balance accuracy with cost in a realistic decision-making environment.
What set their results apart wasn’t just better raw performance, but how reliably the models made the right kind of mistakes. In many real applications, the issue isn’t about eliminating all errors (that’s impossible), but about making fewer high-cost mistakes. A model that confidently makes a bad call in a high-risk situation is far worse than one that admits uncertainty and defers. The research team’s approach led to systems that were better calibrated in that regard, more aware of when to trust themselves and when to hand off the decision.
So how did they measure success? In technical terms, they looked at how closely the AI’s performance on the new surrogate training goals matched the real-world objective: minimizing a combination of error and deferral cost. But beyond that, they also asked: Are these systems making better decisions that reflect business and ethical priorities? That means fewer unnecessary deferrals (which bloat operational costs), and fewer catastrophic misclassifications (which can damage trust or even harm people).
Another important aspect of their evaluation was robustness. The researchers explored how the models behaved under different assumptions (like noisier data or varying levels of expert accuracy) to ensure the approach wasn’t fragile. In some tests, they added complexity to the input data to simulate messier real-world conditions, and still found that the new learning approach held up better than older methods.
Perhaps most meaningfully, they didn’t rely on one-off improvements. The researchers provided theoretical guarantees, mathematical proof (that under broad conditions)—improving their surrogate losses will always improve real decision performance. These guarantees give organizations more confidence that the training methods are not only effective in today’s use case, but also generalize well to future applications or different datasets.
In short, success here wasn’t just defined by accuracy on a spreadsheet, but by how well the system learned to balance risk, cost, and confidence… qualities that align closely with real-world decision-making in enterprise settings.
What truly distinguishes this research is not just that it works better than previous methods, but that it provides clear, principled criteria for what success looks like in a learning-to-defer system. Rather than relying on loose empirical benchmarks or narrow case-by-case validations, the authors back their solution with rigorous consistency guarantees. These are formal conditions that tell us: “If the model improves on this training goal, then it is guaranteed to get better at the real-world decision-making task we care about.” That’s a powerful form of accountability for ML systems, especially those operating in environments where small errors can cascade into big consequences.
To break that down, the researchers define several layers of success. The first is called realizable consistency, which says that if there exists a perfect deferral rule within a model’s capacity, then their training method will find it. This matters because in real-world systems, you often don’t know if a perfect strategy is possible (the guarantee tells you that, if it’s there, so you won’t miss it). The second is a bounded consistency, which provides a more nuanced view: even if perfection isn’t attainable, the method tells you how far off you are from the best possible solution, and under what conditions you can close that gap.
These benchmarks aren’t just academic; they translate into tangible operational insights. For instance, if you’re deploying an AI model in a hospital triage setting, you want to know: “How likely is this model to get critical decisions wrong, and how often will it smartly defer to a human clinician?” These guarantees allow engineers and decision-makers to understand the system’s behavior even before real-world deployment.
That said, no system is perfect, and the researchers are upfront about the limitations of their approach. One of the main constraints is that their methods assume a certain level of flexibility or completeness in the model class (that is, the set of possible decision rules must be rich enough to contain near-optimal strategies). In practical terms, this usually holds for modern AI models like neural networks, but not always for legacy systems or highly constrained environments.
Another limitation is that the deferral cost functions used in their experiments are relatively clean and well-defined (usually tied to error rates of expert decision-makers). But in practice, expert costs can be more complex and context-dependent. For instance, deferring a customer support case might not just cost labor, it could affect customer churn, brand trust, or compliance obligations. Adapting the methods to learn or estimate these richer cost structures is a clear next step for future research.
They also acknowledge the need for real-world testing beyond synthetic and benchmark datasets. While the theory is solid and the simulations are promising, scaling this to live environments (such as hospitals, banks, or industrial control systems) will surface new challenges around explainability, trust, and integration with human workflows.
Still, the impact of this work could be far-reaching. By offering a general-purpose framework for training AI systems to defer intelligently, it has the potential to improve decision quality across a wide range of industries. Organizations could use these methods to reduce costly errors, allocate human expertise more efficiently, and create AI tools that are not only smarter but also more self-aware.
In the bigger picture, this is a move toward a more humble, collaborative form of AI… one that doesn’t aim to replace humans at every turn, but knows when to step back and say, “I’m not sure—someone else should take this one.” And in many high-stakes situations, that may be the most intelligent move of all.
Further Readings
- Mallari, M. (2025, June 27). The resume stops here (unless it shouldn’t). AI-First Product Management by Michael Mallari. https://michaelmallari.bitbucket.io/case-study/the-resume-stops-here-unless-it-shouldnt/
- Mao, A., Mohri, M., & Zhong, Y. (2025, June 25). Mastering multiple-expert routing: realizable Η-consistency and strong guarantees for learning to defer. arXiv.org. https://arxiv.org/abs/2506.20650