Blog

Meet our Advisors: Dr. Caiming Xiong

Dr. Caiming Xiong stands at the forefront of enterprise artificial intelligence worldwide. A world-class computer scientist and strategic executive, Dr. Xiong is widely celebrated for his unique ability to bridge the gap between abstract, foundational AI research and high-impact, commercial software products. Under his technical stewardship, Salesforce AI Research has evolved from an elite incubation lab into the primary engine powering the intelligence layer of the world’s leading CRM system. By converting bleeding-edge deep learning concepts into production-ready enterprise tools—spanning Large Language Models (LLMs), multimodal systems, and autonomous agentic workflows—Dr. Xiong is actively redefining how global businesses deploy AI to automate complex processes and elevate customer experiences.

8 min read·June 19, 2026

Most recently, Dr. Caiming Xiong co-founded Recursive AI alongside Richard Socher, an elite frontier research laboratory built around a singular, ambitious hypothesis: automating the scientific process of AI research itself. Backed by Google Ventures, Greycroft, Nvidia, and AMD, the company has raised over $650 million. It is moving away from hand-designed optimization by human engineers toward an open-ended architecture that teaches AI to rewrite its own codebase continuously. His applied research at Salesforce laid the groundwork for Agentforce and the xLAM (Large Action Model) framework systems designed for independent reasoning, long-context planning, and workflow execution in complex B2B pipelines.

"The model is not the product. The model trained as an integral part of the harness is the product. That is the design principle the industry needs to internalize, and healthcare will learn it first, because the cost of getting it wrong is not a degraded user experience. It is a missed diagnosis, a denied claim, or a compliance violation."Dr. Caiming Xiong [suggested quote pending Dr. Xiong's approval]

We sat down with Dr. Xiong to talk about why general-purpose language models fall short inside healthcare agent workflows, what it actually means to co-design a model and a harness, and why the organizations that build closed learning loops will be the ones that win in regulated AI.

THE CONVERSATION

Why General-Purpose Models Are Not Enough for Healthcare Agent Workflows

actAVA: Healthcare and life sciences are two of the most regulated, operationally complex domains in enterprise AI. Where do most model deployments go wrong when they enter these environments?

The core issue is that professional agents in healthcare and life sciences must handle long, multi-step workflows that require specialized knowledge, policy interpretation, and strict procedural compliance. This is not the same problem as answering a question or summarizing a document. A small mistake misreading a clinical note, missing a policy requirement, applying the wrong prior authorization criteria can block the entire process or create serious regulatory risk. The failure modes are not cosmetic.

What people often underestimate is that the challenge is not only whether the model understands medical or scientific knowledge. That is necessary but not sufficient. The model must also know how to operate correctly inside a specific agent system. That means using the right tools at the right moment, retrieving policies from the right source, interpreting structured records in the right format, following workflow sequencing rules, and knowing when to escalate uncertain cases rather than produce a confident but wrong answer. General-purpose LLMs are not trained for this. They are not trained on the actual tools, APIs, decision points, and failure modes of the production system they are being asked to run in. And that gap shows up often expensively in production.

The critical distinction: A model should not be treated as a generic API component that can simply be plugged into any workflow. The question is not "does this model know about healthcare?" The question is "has this model been trained on the actual agent harness it will operate inside?" Those are different questions with different answers.

Co-Designing Model and Harness: The Architecture the Industry Needs

actAVA: You've talked about the "harness-specific model" concept. What does that actually look like in practice, and why does the design relationship between the model and the harness matter so much?

The key idea is that model and harness should be co-designed and co-trained. The harness should structure tasks, constrain the action space, retrieve the right context, and validate intermediate results. The model should learn how to use these mechanisms reliably. When you treat them as independent components, a general-purpose model dropped into a well-designed framework loses the stability that comes from training the model on the specific interfaces and decision points it will encounter in production.

The goal is to teach the model not only what answer to generate, but how to behave correctly throughout the process. Which tool to call at which step. How to interpret a retrieved policy document versus a structured clinical record. When a result is sufficiently confident to proceed versus when it should pause and escalate. A harness-specific model learns these behavioral patterns directly. A general-purpose model prompted at runtime has to infer them, and inference is where failures accumulate.

The Common Approach: General-purpose model + agent framework

  • Model selected for domain knowledge; harness designed separately

  • Model infers correct tool use, sequencing, and escalation from prompts at runtime

  • Failure modes discovered in production, not anticipated in training

  • Behavioral stability depends on prompt engineering, not learned patterns

The Co-Design Approach: Harness-specific model

  • Trained on the actual tools, APIs, decision points, and workflow steps of the production system

  • Model learns to use harness mechanisms for tool calls, retrieval, and validation as first-class behaviors.

  • Failure modes, edge cases, and escalation patterns are part of the training signal.

  • More stable behavior because correct patterns are internalized, not inferred

In healthcare workflows specifically, prior authorization, utilization management, claims adjudication, and clinical documentation, the number of decision points, the policy interdependencies, and the consequence of each step make behavioral stability non-negotiable. You cannot prompt your way to that level of reliability. You have to train for it.

The Closed Feedback Loop: How Production Data Becomes Competitive Advantage

actAVA: Healthcare policies, clinical guidelines, and payer requirements change constantly. How do you build a model that stays current, and how does real-world usage data factor in?

Compliance requirements and policies in healthcare continuously change. Clinical guidelines are updated, payer policies shift, regulations evolve, internal procedures get revised. A model that relies solely on static knowledge stored in its weights will degrade over time, not because the model itself changes, but because the environment it operates in does. The model must be trained to retrieve, interpret, and apply the latest source of truth at runtime, not to answer based on what it learned during training.

But the more compelling point concerns production data. Real workflow traces, human corrections, failed cases, compliance violations, and escalation patterns these are extraordinarily valuable training signals. Every time a human reviewer catches a misclassification, every time a workflow fails at a particular decision point, every time an agent escalates a case that turns out to require a specific policy interpretation, that data can improve both the model and the harness over time. If you build the system to capture these signals and feed them back into training, you create a closed feedback loop that makes the system increasingly reliable and specialized with every production cycle.

The Compounding Advantage

The organizations that build closed learning loops compound a reliability advantage that cannot be replicated by switching models.

Any competitor can replace a general-purpose model with one that provides API access. A model trained on two years of production traces from a specific health plan's prior authorization workflow, with escalation patterns, human corrections, and compliance violations as training signal, cannot be replicated. That specificity is defensible. The model is no longer a commodity component; it is an organizational asset.

This is the winning approach, and it is distinct from both the "domain-specific model" and the "well-designed agent framework" when treated as isolated strategies. The winning approach is a model trained as an integral part of the agent harness optimized for the specific tools, workflows, policies, and failure modes of the target healthcare application and continuously improved by the signals that production generates every day.

BONUS ROUND

What Makes actAVA's Problem Space Compelling

actAVA: What makes you excited about actAVA, their problem space, and the team?

Healthcare is the domain where the gap between "AI that kind of works" and "AI that reliably works" has the highest real-world consequence. The research problems are hard, the deployment constraints are demanding, and the standards for what counts as production-ready are higher than almost any other sector. That is exactly the environment where the architectural ideas I find most important co-designed models and harnesses, closed learning loops, and harness-specific training matter most. It is not enough to have a capable model. You need a system you can trust to behave correctly under the conditions it will actually encounter in production.

What actAVA has built with KORA is a governed agent infrastructure that takes the governance and auditability requirements of healthcare seriously at the architectural level, not as a compliance checkbox, but as the design premise. The approval lifecycle, the human-in-the-loop gates, the versioned audit trail, the agent-level ROI attribution these are the structural conditions that make a closed learning loop trustworthy. You cannot learn reliably from production data in a regulated environment without that governance substrate. actAVA has built the infrastructure that makes the feedback loop possible in the one domain where it matters most.

About Dr. Caiming Xiong

Dr. Xiong's path to becoming one of the chief architects of enterprise AI began at the intersection of deep learning and computer vision. He joined Salesforce in 2016 through the strategic acquisition of MetaMind, a pioneering deep learning startup founded by prominent NLP researcher Richard Socher. Alongside Socher, Dr. Xiong was instrumental in building Salesforce's modern AI research ecosystem from the ground up, scaling the research organization from a hands-on research scientist to an executive overseeing multidisciplinary teams in computer vision, natural language processing, conversational AI, and reinforcement learning.

At Salesforce, Dr. Xiong was a critical driver of Agentforce and the xLAM (Large Action Model) framework, systems designed for independent reasoning, long-context planning, and workflow execution, moving AI from passive chat-based retrieval to active, context-aware digital operators capable of handling complex enterprise pipelines.

In 2025, Dr. Xiong co-founded Recursive AI with Richard Socher, an elite frontier AI research laboratory built on the hypothesis that AI research itself can be automated. Backed by Google Ventures, Greycroft, Nvidia, and AMD with over $650 million in funding, Recursive AI is developing an open-ended architecture in which AI systems continuously rewrite and improve their own codebase, moving beyond the hand-designed optimization cycles that have defined the field.

Connect with Dr. Xiong on LinkedIn and follow the work at Recursive AI.

Share this