
Dominic Phillips
Software Developer
OpenAI, OpenEvidence, and the New Clinical AI Stack
OpenAI’s April 22, 2026 clinician launch was not another prompt demo. Put it next to OpenEvidence and MedGemma, and the shape looks less like one platform winning and more like a routing table.
The April 22 launch in context
On April 22, 2026, OpenAI opened ChatGPT for Clinicians to U.S. physicians, nurse practitioners, physician assistants, and pharmacists. That is the headline release, but it did not arrive alone. Google shipped MedGemma a few weeks earlier, and OpenEvidence kept quietly stacking publisher deals through the same window.
The three launches are not trying to be the same product, and that is the interesting part. The demo era is mostly behind us. The questions healthcare operators have been asking for years, about where a citation actually came from and whether the answer fits the work, are finally the ones the big labs are trying to answer.
Three distinct product layers are starting to separate inside clinical AI: clinician workspace, evidence retrieval, and private execution.
“Healthcare AI is starting to look less like one product and more like a routing problem.”
OpenAI’s clinician workspace
OpenAI bundled top-tier models with search, citations, deep research over medical journals, reusable workflow skills for referral letters and prior auth, CME support tied to real clinical questions, and optional HIPAA coverage through a BAA for eligible accounts.
The feature list matters. The posture matters more. OpenAI is treating healthcare as a workflow problem, not a better-prompt contest.
The company says its physician group reviewed more than 700,000 model responses, and that clinicians stress-tested 6,924 conversations before launch across care delivery, documentation, and research. OpenAI also says clinicians rated 99.6% of those pre-release responses as safe and accurate, and that on 355 hard citation-heavy cases, the system cited ground-truth sources more often than human physicians did.
Those launch details matter because they point toward productized review loops, citations, and clinical workflow fit rather than a generic chatbot with medical vocabulary.
Where OpenEvidence fits
OpenEvidence is narrower and, in some ways, more legible: its job is to answer point-of-care questions quickly, with citations attached, against a licensed medical corpus, not to play universal clinical copilot.
Its official materials describe a product built to answer point-of-care questions in 5 to 10 seconds, grounded in peer-reviewed literature with references attached. The company has gone deep on corpus quality and licensing, including deals with NEJM Group, JAMA Network, and most recently Wiley. Its About page frames the whole company as a medical information product, not a general chatbot.
The distribution is real too. In its July 15, 2025 announcement, OpenEvidence said it was used across 10,000+ hospitals and medical centers and by more than 40% of physicians in the United States who log in daily. A January 21, 2026 Forbes profile, citing Daniel Nadler, put that at roughly 45% of U.S. physicians and 18 million clinical consultations in the prior month. Halve those numbers and OpenEvidence is still on roughly a quarter of U.S. physician desktops across the country.
OpenEvidence is a medical information product with a modern inference layer on top, not a foundation-model company wearing a lab coat.
Its own positioning increasingly breaks into two products:
- a fast core search product for point-of-care answers
- a slower research mode called DeepConsult, which it says uses advanced reasoning models to analyze and cross-reference hundreds of peer-reviewed studies in parallel
That split maps to how clinicians actually work, where some questions need a usable answer before the next patient arrives, and others need a wider synthesis that can run in the background while the clinician steps away.
Published research lands in the same place. A 2025 Digit Health study indexed on PubMed looked at five systems on real clinical questions and explicitly described OpenEvidence as a retrieval-augmented generation system. In that study, OpenEvidence produced relevant, evidence-based answers for 24% of questions, versus 2% to 10% for the general-purpose LLMs tested. It also produced actionable results for 48% of questions where published evidence already existed. It was weaker on questions that required novel evidence generation instead of literature synthesis, which is exactly what you would expect from a strong retrieval product.
OpenEvidence’s edge is not a claim to be smarter than the frontier labs, but that it sits closer to the actual evidence-retrieval job that clinicians do all day.
MedGemma makes the deployment question harder to ignore
MedGemma is worth revisiting in that frame. Google is saying out loud what closed vendors still tend to leave implicit: some medical workloads need local, private, or tightly controlled deployment.
On its official MedGemma page, Google pitches the model as a starting point for healthcare use cases involving medical text, imaging, and agentic systems, including FHIR-aware workflows and privacy-preserving deployments. Put that next to OpenAI and OpenEvidence, and the market lines up more clearly:
- OpenAI is building the frontier clinician workspace: high-end reasoning, documentation scaffolds, research depth, and a serious clinical workflow loop.
- OpenEvidence is building the evidence layer: licensed content, retrieval, citations, and physician-speed answers.
- Google / MedGemma is building the open infrastructure layer: local or private deployment, adaptation, and developer control.
These are not interchangeable products. They sit next to each other.
Set beside each other, they move the conversation away from whether a model can answer medical questions at all and toward more operational questions:
- Where does the evidence come from?
- Who controls the review criteria?
- What happens on adversarial cases?
- Which tasks should stay local?
- When do you want retrieval versus deeper reasoning?
- How do you keep clinicians in charge without wasting their time?
Those are implementation questions rather than marketing ones. Healthcare forces them because the wrong answer has a patient attached to it.
A Practical Routing Map
Frontier clinician workspace
Evidence retrieval layer
Local or private execution layer
Enterprise orchestration
What I would ask before buying
If you run a healthcare business, the job is matching workload to system. One-winner framings do not survive contact with real operations.
A more useful checklist:
- Separate retrieval from reasoning. A lot of clinical questions are really corpus, citation, and update-latency questions, not “who has the biggest model?” questions.
- Route jobs by workload, not ideology. A privacy-heavy extraction task does not belong on the same inference path as guideline lookup or deeper synthesis by default.
- Ask provenance questions directly. Which sources are licensed? What is indexed? How recent are updates? What happens when citations are missing? If a vendor cannot answer those cleanly, treat the silence as your answer.
The better healthcare AI stacks will be routed. Evidence lookup, documentation, extraction, and private-data workflows should not all travel the same path. Teams that route this way early will spend more intelligently and have cleaner provenance. Teams that do not will keep paying for model runs they cannot audit.

From The Connected Physician · Ep 33 · 30:09
Lukasz Kowalczyk, MD · 100MS
Licensed gastroenterologist, healthcare strategist, and executive advisor, 100MS

Cade Newsletter
Research that moves before the market does.
Original analysis on healthcare strategy, AI adoption, and market dynamics. Delivered when we publish.
No spam. Unsubscribe anytime.

