Research Papers

Caught Between a Fake and a Hard Place

Deepfake detector evaluation reveals the limits of accuracy-driven approaches and the importance of measuring calibration, generalization, and adversarial robustness.

Fair Play or Foul Play? Ordering a Better Blockchain Game

Secret random oracle technology enforces fairness in transaction ordering, reduces hidden costs, and builds trust in decentralized systems.

Agent of Change: When Health Data Gets a Personal Assistant

How personal health multi-agent technology turns fragmented data into accurate, personalized, and actionable wellness guidance.

From Black Box to Profit Box

With Murakkab, organizations can transform agentic AI from a costly experiment into an optimized portfolio.

Hear Me Out

How AuriStream uses biologically inspired design to deliver interpretable, versatile, and scalable speech AI.

Critics in the Machine

How a peer-review system brings objectivity, consistency, and actionable feedback to evaluating design quality at scale.

Bot and Soul: How Robots Can Finally Pull Their Weight

Inside MICoBot, the conversational planning system that lets humans and robots negotiate tasks, cut wasted effort, and boost completion rates.

Yield to Better Performance

Using BTPG-max to unlock safe flexibility in robot coordination, reducing delays and increasing operational speed across complex environments.

Chain Reaction: When AI’s Train of Thought Builds Its Own Playbook

CoT-Self-Instruct generates and filters smarter prompts—enabling more reliable and business-ready large language models.

Freeze Frame, Future Game

A new framework shows how frozen video models can be repurposed to predict motion, depth, and more—unlocking real-time foresight across multiple tasks.

SPaRK of Genius

How SPaRK helps large language models make better decisions by balancing accuracy with tool diversity through step-wise policy learning.

The Fine Print on Privacy Just Got Clearer

f-DP framework turns vague differential privacy parameters into actionable risk metrics for re-identification, attribute inference, and data reconstruction.

Motion Granted

Movement foundation models aim to make AI smarter about the way we move—unlocking breakthroughs in diagnosis, interaction, and embodied intelligence.

The Fast and the Spurious

SceneDiffuser++ tackles the challenge of simulating full-length urban trips with dynamic traffic, agent behavior, and real-time scene generation.

Mind the (View) Gap

A look at MindCube’s breakthrough in teaching AI to infer spatial layouts and reason beyond the camera frame.

Taming the Wolves Inside the Machine

Understanding how AI language models weigh competing values like truth, politeness, and clarity (and why that matters for trustworthy deployment).

Let Me Defer to Someone Smarter

How new AI training methods help models decide when to act and when to pass decisions to experts.

Let’s Be Reasonable: Teaching AI When to Think Harder

How RRC helps AI systems make fairer decisions under real-world constraints like limited time, compute, and conflicting values.

Memoir Control

A look at StorySage, the AI system helping users turn scattered memories into coherent autobiographies through multi-agent conversation.

Objection, Your Honor: AI Can't Hide Behind Subclaims Anymore

AI debate prevents models from hiding errors in complexity—creating a more reliable path to scalable oversight and verifiable reasoning.

Watch and Learn? This AI Actually Does

V-JEPA 2 uses predictive self-supervised learning to teach AI systems how to understand and act in physical environments.

Oops, All Procedures!

SOP-Bench sets a new standard for evaluating whether AI agents can reliably execute long-form SOPs in enterprise settings.

Game of Models

A new benchmark called Orak tests LLMs in real-world video games to evaluate decision-making, planning, and adaptability.

Go With the Flow—Or Don’t

Self-organizing flight paths help autonomous aircraft choose between direct routing and following traffic—cutting delays and increasing scalability.

The Long and Winding ProRL

This new reinforcement learning method helps language models discover novel reasoning strategies (not just repeat what they already know).

Memory Lane Just Got Longer

ATLAS introduces test-time memory optimization to help AI models understand and reason far beyond traditional context limits.

From Mesh to Impress

RenderFormer replaces traditional ray tracing with a learned transformer model—streamlining lighting, reflections, and realism.

The Cut-and-Paste Monstrosity You Might Already Be Reading

What Frankentexts reveal about AI writing, content attribution, and the limitations of current detection technologies.

Class Is in Session—for Your AI

DSMentor introduces how AI can mimick how humans learn—using curriculum sequencing, long-term memory, and feedback loop.

AI’m With the Travel Planner

A closer look at how Vaiage uses multi-agent LLMs to build dynamic, human-like planning systems for complex real-world tasks.

Getting in Context: A Long Story Short

MegaBeam‑Mistral‑7B delivers end‑to‑end 512K‑token processing for enterprise document workflows.

From Blah to JSON: SLOT Makes AI Output Actually Useful

SLOT addresses free-form AI text and the structured formats in real-world software systems demand, without breaking your workflows.

Safety in Numbers (But Only If You’re Smart About It)

A look at low Layered Safe MARL enables scalable, conflict-free coordination for autonomous fleets.

Fact-Checked by AI... or Just Faked Real Good?

HalluMix reveals the strengths and weaknesses of today’s top hallucination detectors across tasks, domains, and contexts.

Weight for It… Why Your AI Isn’t Learning What You Think It Is

New research reveals the limits of fine-tuning and offers a smarter way to help LLMs generalize and adapt in real-world scenarios.

Scope, There It Is!

Redefining the front-end of AI innovation with a PSA—helping organizations move from vague ideas to viable project plans.

Free Case Studies