ResearchApril 2026

Dominic Phillips

Software Developer

OpenAI, OpenEvidence, and the New Clinical AI Stack

OpenAI’s April 22, 2026 clinician launch was not another prompt demo. Put it next to OpenEvidence and MedGemma, and the shape looks less like one platform winning and more like a routing table.

The April 22 launch in context

On April 22, 2026, OpenAI opened ChatGPT for Clinicians to U.S. physicians, nurse practitioners, physician assistants, and pharmacists. That is the headline release, but it did not arrive alone. Google shipped MedGemma a few weeks earlier, and OpenEvidence kept quietly stacking publisher deals through the same window.

The three launches are not trying to be the same product, and that is the interesting part. The demo era is mostly behind us. The questions healthcare operators have been asking for years, about where a citation actually came from and whether the answer fits the work, are finally the ones the big labs are trying to answer.

Three Layers

Three layers are starting to separate

Workspace

Frontier workspace

Clinical reasoning, documentation, and research in one clinician-facing shell.

Evidence

Evidence layer

Retrieval, citation, and speed built around the point-of-care question.

Infrastructure

Local execution

Privacy, routing, and deployment control for workloads that should not leave the perimeter.

Three distinct product layers are starting to separate inside clinical AI: clinician workspace, evidence retrieval, and private execution.

“Healthcare AI is starting to look less like one product and more like a routing problem.”
Dominic PhillipsSoftware Developer

OpenAI’s clinician workspace

OpenAI bundled top-tier models with search, citations, deep research over medical journals, reusable workflow skills for referral letters and prior auth, CME support tied to real clinical questions, and optional HIPAA coverage through a BAA for eligible accounts.

The feature list matters. The posture matters more. OpenAI is treating healthcare as a workflow problem, not a better-prompt contest.

The company says its physician group reviewed more than 700,000 model responses, and that clinicians stress-tested 6,924 conversations before launch across care delivery, documentation, and research. OpenAI also says clinicians rated 99.6% of those pre-release responses as safe and accurate, and that on 355 hard citation-heavy cases, the system cited ground-truth sources more often than human physicians did.

Those launch details matter because they point toward productized review loops, citations, and clinical workflow fit rather than a generic chatbot with medical vocabulary.

Where OpenEvidence fits

OpenEvidence is narrower and, in some ways, more legible: its job is to answer point-of-care questions quickly, with citations attached, against a licensed medical corpus, not to play universal clinical copilot.

Its official materials describe a product built to answer point-of-care questions in 5 to 10 seconds, grounded in peer-reviewed literature with references attached. The company has gone deep on corpus quality and licensing, including deals with NEJM Group, JAMA Network, and most recently Wiley. Its About page frames the whole company as a medical information product, not a general chatbot.

The distribution is real too. In its July 15, 2025 announcement, OpenEvidence said it was used across 10,000+ hospitals and medical centers and by more than 40% of physicians in the United States who log in daily. A January 21, 2026 Forbes profile, citing Daniel Nadler, put that at roughly 45% of U.S. physicians and 18 million clinical consultations in the prior month. Halve those numbers and OpenEvidence is still on roughly a quarter of U.S. physician desktops across the country.

OpenEvidence is a medical information product with a modern inference layer on top, not a foundation-model company wearing a lab coat.

Its own positioning increasingly breaks into two products:

a fast core search product for point-of-care answers
a slower research mode called DeepConsult, which it says uses advanced reasoning models to analyze and cross-reference hundreds of peer-reviewed studies in parallel

That split maps to how clinicians actually work, where some questions need a usable answer before the next patient arrives, and others need a wider synthesis that can run in the background while the clinician steps away.

Published research lands in the same place. A 2025 Digit Health study indexed on PubMed looked at five systems on real clinical questions and explicitly described OpenEvidence as a retrieval-augmented generation system. In that study, OpenEvidence produced relevant, evidence-based answers for 24% of questions, versus 2% to 10% for the general-purpose LLMs tested. It also produced actionable results for 48% of questions where published evidence already existed. It was weaker on questions that required novel evidence generation instead of literature synthesis, which is exactly what you would expect from a strong retrieval product.

OpenEvidence’s edge is not a claim to be smarter than the frontier labs, but that it sits closer to the actual evidence-retrieval job that clinicians do all day.

Public model disclosure is still thin

OpenEvidence is much clearer about corpus quality and publisher relationships than it is about a fixed model stack. I could not find a primary-source page naming a stable canonical lineup. The safest public read is architectural: licensed retrieval first, then model routing or specialist reasoning layers for deeper synthesis. Forbes reports that Daniel Nadler is building domain-based routing, which is useful signal but still not the same thing as transparent model disclosure.

MedGemma makes the deployment question harder to ignore

MedGemma is worth revisiting in that frame. Google is saying out loud what closed vendors still tend to leave implicit: some medical workloads need local, private, or tightly controlled deployment.

On its official MedGemma page, Google pitches the model as a starting point for healthcare use cases involving medical text, imaging, and agentic systems, including FHIR-aware workflows and privacy-preserving deployments. Put that next to OpenAI and OpenEvidence, and the market lines up more clearly:

OpenAI is building the frontier clinician workspace: high-end reasoning, documentation scaffolds, research depth, and a serious clinical workflow loop.
OpenEvidence is building the evidence layer: licensed content, retrieval, citations, and physician-speed answers.
Google / MedGemma is building the open infrastructure layer: local or private deployment, adaptation, and developer control.

These are not interchangeable products. They sit next to each other.

Set beside each other, they move the conversation away from whether a model can answer medical questions at all and toward more operational questions:

Where does the evidence come from?
Who controls the review criteria?
What happens on adversarial cases?
Which tasks should stay local?
When do you want retrieval versus deeper reasoning?
How do you keep clinicians in charge without wasting their time?

Those are implementation questions rather than marketing ones. Healthcare forces them because the wrong answer has a patient attached to it.

A Practical Routing Map

Frontier clinician workspace

Use a product in the ChatGPT for Clinicians class for broad synthesis, documentation drafts, differential support, and literature review where you need top-end reasoning.

Evidence retrieval layer

Use an OpenEvidence-style system when the job is finding, grounding, and citing the best available literature inside clinician time constraints.

Local or private execution layer

Use MedGemma-class open models for privacy-heavy ingestion, medical imaging experimentation, structured extraction, or internal routing where cloud dependence is the wrong answer.

Enterprise orchestration

Build the policy, routing, observability, and human-review layer that decides which task goes where. That routing layer matters more than vendor loyalty.

What I would ask before buying

If you run a healthcare business, the job is matching workload to system. One-winner framings do not survive contact with real operations.

A more useful checklist:

Separate retrieval from reasoning. A lot of clinical questions are really corpus, citation, and update-latency questions, not “who has the biggest model?” questions.
Route jobs by workload, not ideology. A privacy-heavy extraction task does not belong on the same inference path as guideline lookup or deeper synthesis by default.
Ask provenance questions directly. Which sources are licensed? What is indexed? How recent are updates? What happens when citations are missing? If a vendor cannot answer those cleanly, treat the silence as your answer.

The better healthcare AI stacks will be routed. Evidence lookup, documentation, extraction, and private-data workflows should not all travel the same path. Teams that route this way early will spend more intelligently and have cleaner provenance. Teams that do not will keep paying for model runs they cannot audit.

How Physicians Can Lead the AI Transformation with Dr Kowalczyk

Related From Cade

Hairline architect-pencil wireframe of an abstract small-compute-on-device glyph with a cube nested in an open frame and two trailing rails on a pale lilac paper background

EngineeringApril 2026

Dominic Phillips

Software Developer

MedGemma and the Local Models Moment

Our earlier take on why open medical models and local inference changed the deployment math for healthcare AI.

Read the post

Cade Newsletter

Research that moves before the market does.

Original analysis on healthcare strategy, AI adoption, and market dynamics. Delivered when we publish.

No spam. Unsubscribe anytime.