Multimodal Mayhem, Meet Your Match
InternVL3 enables smarter, faster decisions by combining vision and language processing—unlocking scalable efficiency.
Tammy had always believed that insurance was about peace of mind. But lately, her role as fictional lead of operations at Y’allstate Insurance Co. (a fictional, well-established player in the industry) had started to feel less like a mission of protection and more like a daily battle against inefficiency. Nestled in a corner office filled with sticky notes, flowcharts, and the faint smell of reheated coffee, she found herself facing a crisis that no spreadsheet or legacy AI dashboard could seem to fix.
Every morning brought in a flood of new claims. A typical one? A customer submits a photo of a dented fender, accompanied by a PDF police report, an email thread with the local adjuster, and sometimes, even a hand-scrawled note scanned in by the client. Tammy’s team would then spend hours toggling between systems: image analysis tools, document readers, and old-school internal claim forms.
And despite the investments in AI, the systems they’d integrated were more like disjointed patches than a cohesive solution. One tool could scan and extract text from police reports. Another flagged potential fraud based on policy history. Yet none of them could understand the story across the visuals and the words. They were working harder, not smarter … and the customers could feel it.
When Smarter Tools Still Fall Short
The friction wasn’t just internal. Externally, customers were growing increasingly vocal about their dissatisfaction. Claims took too long. Communication was confusing. Tammy could almost recite the common complaints by heart … like the woman who uploaded photos of her damaged SUV and then spent three weeks explaining to different agents that her case wasn’t “standard procedure.” Or the man who submitted his paperwork twice, only to be told that “the image data doesn’t match our records.”
The internal challenges were deeply tied to the way their AI systems processed information. Most of these tools had been developed to handle either structured text or visual evidence, but not both. They were born in silos and stayed there. That’s when Tammy realized she was up against more than outdated tools; she was facing a more systemic issue. Y’allstate’s AI wasn’t failing because it was underpowered; it was failing because it was fragmented.
And the pressure was mounting.
New regulations were being introduced that required greater transparency in claim decisions (auditable, explainable outcomes that would hold up in court or under regulatory scrutiny). Customers were demanding faster, fairer assessments. And whispers about Calamansi.ai (a fictional, fast-rising competitor) were growing louder. Word around the industry was that they were testing a system that could evaluate visual evidence and documentation together—dramatically reducing claim time and increasing consistency.
Tammy didn’t need to see a press release to know what that meant: if they didn’t act soon, Y’allstate wouldn’t just be behind—they’d be irrelevant.
The Real Cost of Doing Nothing
Leadership had been hesitant to adopt any more AI tools. After all, they’d already invested millions in automation, and were still neck-deep in backlog. But Tammy knew this wasn’t just another tech upgrade; this was a turning point. The insurance business had always been data-driven, but for the first time, understanding the connection between data types (how a photo corroborates a report, how a timestamp in an email aligns with a vehicle’s damage pattern) was becoming a competitive differentiator.
Ignoring the growing need for integrated AI would mean more than just slow claims; it would mean:
- Higher customer churn, as clients looked for providers who “got it right the first time.”
- Increased exposure to compliance risks, with audit trails that couldn’t clearly explain decisions.
- Employee burnout, as frontline teams compensated for what the systems couldn’t deliver.
But the worst consequence? Falling behind in an industry that was evolving from paperwork-heavy processing to cognitive, integrated decision-making. Where competitors no longer bragged about having AI, they demonstrated it in real time (with claim decisions made in minutes, not weeks).
Tammy could already envision what that future looked like: fewer angry calls, more clarity in communication, and a workforce empowered by systems that actually helped them think, instead of just filtering data.
She didn’t need another dashboard. She needed a model that could truly understand … one trained to see and read like a human, capable of recognizing not just information, but also context.
And to make that leap, she’d have to push Y’allstate toward a radically new approach.
Reframe the Problem, Redefine the System
For Tammy, it wasn’t just about fixing a slow claims process; it was about reframing the entire role of AI in the organization. The tools in place had done what they were designed to do. The problem wasn’t that the software was broken; it was that the business had outgrown it.
The kind of intelligence she needed couldn’t be assembled by gluing together single-purpose models. It required a native, end-to-end system (one that was born to understand multiple forms of data at once). She needed AI that could read a police report and see the photo of a crumpled bumper, and instantly grasp their relationship. Not in separate steps, but simultaneously. Like a claims adjuster, just exponentially faster and more consistent.
That’s when she came across new research from the world of foundation models, specifically, something on InternVL3. Developed by researchers focused on bridging the divide between vision and language processing, the model was trained to interpret images, text, and even layout-aware documents in an integrated fashion. Not just looking at images or through paragraphs, but also understanding the entire context they formed together.
This wasn’t a patch. It was a platform.
From a strategic perspective, Tammy knew the business case would need to speak to more than just technology. It had to align with measurable results. So she framed the shift around one clear, ambitious objective: transform Y’allstate’s claims pipeline into a seamless, intelligent, and explainable system … using multimodal AI.
The key results she outlined were bold, but grounded:
- Cut the average claims cycle time by 40%.
- Improve audit-reviewed claim accuracy by 30%.
- Automate 80% of standard claims with minimal human review.
But the results weren’t just about numbers. They were about restoring the trust between customer and carrier, and about giving her frontline team tools that didn’t drain their time and energy, but actually enhanced their judgment.
Build It Like It Matters—Because It Does
Translating this vision into reality meant more than rolling out a new model. Tammy needed to orchestrate a thoughtful, phased approach … one that balanced experimentation with accountability. InternVL3, while state-of-the-art, was an open-source model. That gave her team both flexibility and responsibility.
The first step was a pilot. They carved out a controlled sandbox environment using real, anonymized claim files. Dozens of cases with known outcomes were selected, including those with high complexity: damaged vehicles, medical reports, mixed-format documentation.
To teach the model how to handle the nuances of insurance claims, her team applied several of the techniques inspired by the InternVL3 research:
- Variable Visual Position Encoding (V2PE): Most AI models struggled to understand document layout, where something appeared on a form mattered just as much as what it said. V2PE allowed the model to grasp those spatial relationships, like how a date in the header might correlate with a timestamp in a photo.
- Mixed Preference Optimization (MPO): This trained the model not just on right or wrong answers, but on what better decisions looked like based on the historical preferences of expert adjusters. Instead of treating all data equally, MPO helped the model learn from patterns that led to faster resolutions and fewer disputes.
- Tammy insisted on a human-in-the-loop review stage: AI decisions would be reviewed by a senior claims team member during the pilot phase, not only to ensure legal defensibility but also to fine-tune the model through feedback loops. The AI wasn’t just there to replace human judgment; it was there to sharpen it.
Throughout the pilot, her team monitored not only performance metrics, but also how the claims adjusters felt about the tool. Early skepticism gave way to cautious optimism as employees realized the system wasn’t trying to take away their roles; it was handling the repetitive, time-consuming steps so they could focus on complex, high-impact decisions.
Customer interactions improved too. With clearer documentation of how decisions were made, complaints dropped. More importantly, confidence grew inside and outside the organization.
As Tammy walked through her pilot results in front of the executive team six weeks later, she didn’t need to oversell the results. The story told itself. Cases that had taken days now took hours. AI-generated notes were easier to understand and required fewer edits. And customer satisfaction scores for pilot participants were already trending upward.
The message was clear: this wasn’t just a tech initiative; it was also a business transformation. And it had already begun.
Turn Efficiency into Advantage
The early results from the pilot weren’t just promising; they were catalytic. Tammy had walked into the initiative looking for marginal gains. What she discovered was something closer to a paradigm shift.
By applying a native multimodal model, the claims pipeline began to function as a unified system, not a series of disconnected tasks. Image evidence, textual documentation, spatial context … all of it was processed in real time, in one pass, with the model surfacing the most relevant information for a decision. Suddenly, what used to take an adjuster 45 minutes (comparing photos to handwritten notes and cross-referencing them with internal policies) was reduced to a five-minute review and approval.
This wasn’t just speed; it was clarity. Decisions came with explanations generated by the model—helping agents walk customers through the rationale behind each resolution. That transparency helped restore credibility, and it defused tension in ways that were measurable: customer follow-up calls dropped, and resolution satisfaction surveys ticked steadily upward.
The original goal of reducing claims processing time by 40% began to look conservative. For standard, low-risk claims, the automation rate climbed close to 80%, with no significant increase in dispute rates. Tammy’s team had found that sweet spot where technology didn’t just replicate human performance; it extended it.
And the effects rippled beyond the claims department. With AI handling the brunt of high-volume casework, senior adjusters were redeployed toward more strategic roles: policy fraud detection, complex liability cases, and customer experience design. Employees reported feeling more empowered; turnover dropped. Even onboarding for new hires improved, since the AI now served as a sort of institutional memory—modeling high-quality judgment patterns for learners to observe and build upon.
In a way, the machine had become a mentor.
Redefine What Success Looks Like
With momentum building, the leadership team at Y’allstate Insurance Co. asked Tammy a question she hadn’t expected: “What does great look like now?”
Her answer was simple: great is what happens when every decision in our company is informed by context, not just data.
Good was where they had started: a functioning prototype, an internal buzz, some measurable improvements. That alone would’ve justified the pilot. Better was the phase they had just entered: accelerating ROI, broader adoption, and signs of competitive differentiation. Claims were being processed more quickly, yes … but with fewer errors, more consistency, and a noticeable uplift in customer sentiment.
But “best”? That was something more ambitious. It would mean:
- A fully integrated claims experience across all channels (web, mobile, in-person) powered by the same intelligent core.
- A model architecture that could be extended beyond auto claims into property, health, and specialty insurance products.
- A company culture that no longer saw AI as a black box, but as a co-pilot: transparent, trainable, and trusted.
Best meant Y’allstate becoming a reference point in the industry, not just for adopting AI early, but for doing it with care and strategic clarity.
It meant being the kind of company that Calamansi.ai would study, not just compete against.
The New Standard for Smart Decisions
As Tammy documented the journey for other departments considering their own AI transformation, she returned to a principle that had quietly guided her all along: Technology alone doesn’t change outcomes; alignment does.
InternVL3 and its multimodal architecture weren’t magic bullets. What made them powerful was how well they aligned with the real-world complexity of insurance work. They didn’t force simplification. They embraced nuance. And in doing so, they made human intelligence more scalable, more available, in more moments, with better outcomes.
The shift wasn’t just about adopting new tools. It was about adopting a new way of thinking … one where understanding is more valuable than information, and context is more powerful than speed.
Tammy’s work didn’t just fix a broken workflow. It revealed a new baseline for what competence (and excellence) could look like in a digital-first insurance company. Not a futuristic ideal, but a present reality. One built on systems that see as well as they read. One where employees are lifted by the intelligence they helped shape. And one where customers, for once, get the peace of mind they were promised.
Not because the process was automated. But because it finally made sense.
Further Readings
- Mallari, M. (2025, April 15). Now you see it, now you read it. AI-First Product Management by Michael Mallari. https://michaelmallari.bitbucket.io/research-paper/now-you-see-it-now-you-read-it/