A Case Study on Applied AI Research in the Financials Sector

Signal Strength: Getting Agents on the Same Wavelength

Why structured communication and well-designed feedback loops turn fragmented AI workflows into trustworthy multi-agent solutions.

At a fictional insurer PolySure Mutual, claims transformation leader Maya (also fictional) found herself confronting an emerging truth: her department’s biggest problem wasn’t the sophistication of its AI tools, but the way those tools behaved when asked to work together. Customers were growing increasingly frustrated with claims experiences that swung unpredictably between impressively fast and painfully slow. Employees quietly admitted that the “AI helpers” meant to streamline their jobs often generated more confusion than clarity. And Maya, who had championed agentic AI from the start, realized she was now responsible for knitting together a system that had grown organically, unevenly, and with too little coordination across functions.

PolySure’s claims operation had started with a simple idea—add specialized AI assistants to speed up every major step of the journey. A triage bot greeted new claims, a fraud bot scanned for anomalies, a coverage bot interpreted policy language, and a settlement bot reviewed estimates. Each was built or bought for a different reason, at a different moment, by a different team. The result resembled a well-meaning committee meeting: everyone talking, everyone capable, yet nobody holding the full picture. Customer complaints reflected this reality. They didn’t understand why their neighbors’ claims were resolved in hours while theirs stalled. And in internal reviews, adjusters admitted they often needed to “untangle” differing bot recommendations just to get a claim back on track.

Maya’s frustration wasn’t rooted in technology fatigue—it came from her professional identity. She prided herself on delivering systems that balanced customer empathy with operational discipline. The promise of AI, to her, was never cold automation; it was consistency, fairness, and a smoother path to human judgment. But she could no longer ignore the symptoms of fragmentation. The triage bot interpreted events at a high level. The fraud bot approached everything as a pattern-recognition problem. The coverage bot lived in its legal dictionary. The settlement bot obsessed over numbers. Each had a partial view, and none had a structured way to reconcile it with the others.

The Forces Turning Pressure Into Urgency

External expectations only magnified the cracks. Competitors—fictional ones like SwiftGuard Insurance—boasted one-click resolution experiences and AI-driven service that supposedly “just worked.” Policyholders, accustomed to instant decisions in banking and retail, carried those expectations into their insurance interactions. Regulators, meanwhile, sharpened their focus on explainability and fairness, especially when algorithms influenced claim outcomes. Internally, leaders pressed for automation that reduced cycle times and headcount reliance, not automation that shifted work into manual reconciliation.

Maya understood that if PolySure continued layering AI agents without improving how they coordinated, the company was building a house with an unstable foundation. The pressure wasn’t merely operational; it was existential. The gap between what PolySure promised and what it delivered was widening, and stakeholders were noticing.

When Agents Don’t Coordinate, the Ground Starts to Shake

If Maya ignored these dynamics, the consequences would extend far beyond a few frustrated adjusters. Claims outcomes would grow more uneven, feeding customer distrust and increasing churn. Operational firefighting would intensify as humans scrambled to translate conflicting agent outputs into coherent decisions. Audit trails would fracture, leaving the company exposed to complaints and regulatory challenges. And perhaps most dangerously, leadership would lose faith in AI entirely—not because AI lacked potential, but because PolySure failed to develop the ability to make its agents collaborate with clarity, rigor, and shared understanding.

Choosing a New Strategic Path

Maya realized the real competitive advantage wouldn’t come from adding more agents or upgrading each model independently. It would come from teaching her existing agents to work together with the same professionalism and shared clarity she expected from human teams. The shift in thinking was subtle but profound: PolySure didn’t have an AI problem—it had a coordination problem. And coordination, she knew, could be engineered, measured, and improved.

Her strategy centered on a simple proposition: before AI agents touched real claims, they would need to demonstrate they could collaborate effectively under controlled, intentionally constrained conditions. Instead of evaluating each agent in isolation, she reframed the goal around collective reliability. This is where the inspiration from the AsymPuzl research (from Dartmouth) became catalytic. The insight was not to replicate the technology itself but to adopt its philosophy: test collaboration using structured, partial information; understand how feedback shapes joint reasoning; and build a repeatable mechanism to verify that agents can converge on a consistent answer.

Maya articulated her strategy through clear, business-aligned objectives. Claims decisions should not only be faster—they should be explainable and consistent across agents. Customers should feel the benefit in smoother experiences. And the entire organization should gain a reusable testbed that brought discipline to agent design, integration, and evaluation. She set her team on a path where every AI tool had to earn its place by demonstrating collaborative competence, not just smart-sounding output.

Bringing Strategy to Life Through Deliberate Action

To make this vision tangible, Maya began by assembling a small task force across claims, data science, engineering, and compliance. Their mandate was to construct a simplified but realistic environment where agents interacted much like they would in production, yet with the clarity and controllability needed for scientific evaluation. They created synthetic claims with carefully partitioned data, giving each agent only the slice of information it would have in its real workflow. Every agent had to form an internal hypothesis and refine it through structured messages, mirroring the patterns highlighted in the research while staying grounded in insurance logic.

Maya insisted that feedback be treated as a design variable, not an afterthought. They experimented with different forms of evaluative signals—sometimes letting agents know only if their individual assessments aligned with the underlying claim, other times revealing whether the collective recommendation held together. They varied how detailed the signals were, paying close attention to how each change influenced the agents’ ability to converge on a shared conclusion. It wasn’t about copying the AsymPuzl puzzle mechanics; it was about instilling a disciplined understanding of how information asymmetry and feedback design shaped the reliability of complex, multi-step decisions.

She also pushed for telemetry that traced the entire chain of agent interactions in real claims flows. This allowed her team to observe not just outcomes, but behaviors: which agents hesitated, which overcorrected, and which never really engaged in shared reasoning. With richer visibility came the ability to refine prompts, restructure orchestration, and eliminate brittle interactions that created uncertainty for customers and adjusters.

Finally, Maya wove these insights into procurement and system governance. Any new vendor had to participate in the coordination benchmark before integration. Internal teams proposing new agents had to justify how their design improved—not weakened—the cooperative fabric of the claims ecosystem. Through these actions, the company began treating coordination not as a byproduct of AI but as a core competency that deserved focus, investment, and standards.

Unlocking the Tangible Wins

The first signs that Maya’s strategy was working appeared not in dashboards, but in the tone of conversations among frontline adjusters. The usual grumbling about “dueling bots” began to fade. Agents that once issued conflicting recommendations now produced assessments that aligned more naturally, requiring less human arbitration. What mattered most to Maya was not just that decisions were faster, but that they were increasingly coherent—the product of systems that reasoned together rather than in isolation.

Customers began experiencing smoother journeys as well. A previously erratic claims process started to feel predictable: fewer unexplained pauses, fewer reversals, fewer “We’re re-evaluating your file” messages. The emotional burden of uncertainty lifted slightly, and small but noticeable improvements in satisfaction quietly validated the effort. To Maya, these weren’t just operational wins—they were proof that careful engineering of multi-agent coordination could produce human felt outcomes. Faster answers mattered, yes, but consistent, explainable answers mattered even more.

Internally, PolySure developed a growing sense of confidence. The coordination testbed became a fixture in the transformation program, the kind of capability that outlives individual leaders or product cycles. As new AI vendors entered the market and internal teams proposed additional copilots, they were now assessed through a shared lens—could the new agent integrate cleanly into the reasoning fabric PolySure had built? This discipline transformed the company’s posture from reactive experimentation to proactive governance.

Defining Success With Clarity and Ambition

Maya refused to let the initiative drift into the vague territory where many AI programs land. She worked with her teams to distinguish between acceptable, excellent, and industry-leading outcomes. “Good” meant a reduction in obvious contradictions between agents—an outcome that would reduce frustration but still leave room for improvement. “Better” meant reliable coordination across most claims and a measurable reduction in human intervention to reconcile competing recommendations. The gold standard—the “best”—was a claims ecosystem with such strong internal coherence that PolySure could confidently describe it as a differentiating capability.

Reaching that level required more than just cleaner performance metrics. It meant that the system’s reasoning paths had become clear enough for auditors to follow without special tools. It meant agents could consistently justify their contributions to a decision in ways humans understood. Ultimately, it meant that Maya’s coordination framework had become woven into the insurer’s identity—a subtle but powerful advantage over rivals still wrestling with fragmented automation.

Carrying Forward What the Journey Taught

When Maya reflected on the transformation, several lessons crystallized. First, agentic AI cannot be treated as modular widgets that operate independently. Their true value emerges only when they are evaluated and improved as a system. Second, transparency and structure are antidotes to uncertainty. When agents communicate through well-defined channels and receive thoughtfully designed feedback, their behaviors become more stable, predictable, and trustworthy.

Perhaps the most profound lesson was this: innovation in multi-agent AI isn’t about assembling the flashiest models—it’s about creating an environment where partial perspectives come together to form reliable conclusions. The ethos of the work is rigor; the pathos lives in the relief customers feel when their experience is smooth and fair; and the logos rests in a disciplined operational model that scales with confidence rather than chaos.

Through Maya’s approach, PolySure learned that the real frontier in AI isn’t intelligence—it’s coordination. And mastering that is what turns automation from a collection of tools into a strategic advantage.


Further Reading

Free Case Studies