Communications Director, Connecticut Hospital Association
110 Barnes Road, Wallingford, CT
rall@chime.org, 203-265-7611
STAT News – Monday, April 13, 2026
By Katie Palmer
Every day, more than 40 million people ask ChatGPT about health care, according to OpenAI. They’re asking questions about diet, exercise, insurance — and in some cases, serious symptoms that would typically get discussed on a 911 call or in a doctor’s office.
For some health systems, that’s creating an imperative. A small number of hospitals are trying to recapture some of those clinical conversations from commercial large language models like ChatGPT, Claude, and Gemini. They’re implementing their own patient-facing chatbots, ones that draw directly from their existing medical records and can funnel patients toward care in their own system.
Hartford HealthCare this week will launch PatientGPT, a chatbot engineered by clinical AI company K Health, to its patients in Connecticut. Two health systems — California-based Sutter Health and Reid Health, serving Indiana and Ohio — have announced pilot versions of Emmie, the chatbot built by medical record mammoth Epic. The list is likely to grow rapidly.
“Health systems need to do this, either through a vendor or building it themselves,” said Mount Sinai chief AI officer Girish Nadkarni, the senior author of a recent study that found ChatGPT Health missed high-risk emergencies when used to triage patients.
The big question is how. Unlike commercial LLMs, which are not in the business of providing health care, hospitals more directly carry liability for chatbot failures that could end up harming patients. Can health systems catch up fast enough while testing chatbots adequately for safety and performance?
When a commercial chatbot is used to answer clinical questions on its own, the “stakes are incredibly high,” said Adam Rodman, a clinical reasoning researcher and internist at Beth Israel Deaconess Medical Center in Boston. “It can work 95% of the time, and if it fails at those extreme cases, those are the ones where you want it to work.”
Hospitals are starting to see more of those early triage conversations “happen completely outside of the hospital,” said Peter McCaffrey, chief AI and digital officer at the University of Texas Medical Branch. “This is something we’ve never dealt with before in health care.”
The hope is that chatbots sited within a health system will be safer because they can route patients to a local urgent care, emergency room, or doctor. “That’s just structurally safer than a patient-facing chatbot with no backup or no direct way to get an appointment,” said Nadkarni.
A health system’s own chatbot can also pull information from its patients’ existing medical records, potentially offering more personalized and relevant answers. Commercial products often encourage users to upload lab results and medical charts, but those don’t come with the same level of privacy and security as a clinical tool, and may not be as well-tuned to capture the right information from the right places in a record.
For now, though, whether chatbots integrated with health systems benefit patients is largely unknown.
“It’s a tempting idea,” said Rodman, who believes models will eventually take appropriate context from medical records to make better decisions. “But we’re not there yet.” Health systems are only beginning to build the clinical evidence necessary to support the use of their own chatbots.
The early movers
Given the newness of the technology, some of the early launches are starting small. Sutter Health, the first Emmie user, had a beta group smaller than 100 people, and Reid Health just turned on Epic’s Emmie last week for a small group of employees. “We are in very early stages,” said Muhammad Siddiqui, Reid’s chief digital and innovation officer. “I don’t really have any data.”
For now, Reid has deployed a feature called Chart Aware that will allow a user to interrogate results and trends from their medical record. The next feature will help explain what a radiological result means, and the health system hopes to go live with both tools for all Reid patients by mid-June.
Emmie could have a “structural advantage” over other patient-facing chatbots, said Nadkarni, because it’s available through Epic’s MyChart app, which millions of patients already use. At Sutter, early users can ask about general wellness and their medical history, both of which could prompt the chatbot to suggest booking a visit. “I think of it as an alternate to a consumer product,” said Sutter Chief Digital Officer Laura Wilt.
Those features are mirrored by PatientGPT, the chatbot Hartford HealthCare will start offering to tens of thousands of its established patients this week. Hartford owns a stake in K Health, which developed the tool.
PatientGPT has two modes. If a user asks a generic medical question, it’ll draw from a large knowledge base and answer as best it can, sometimes incorporating information about the patient. A question about gut pain while taking the weight loss medication Mounjaro, for example, references the patient’s prescription in its answer.
But if a patient starts asking questions about their own symptoms — which happened about a third of the time during Hartford’s early testing — the chatbot shifts into a “medical intake” mode. It’s still having a conversation with the patient, but its line of questioning also pulls from deterministic clinical flowcharts. The tone is notably more brusque and to-the-point.
After the bot gets enough info from the patient, it will point them to a next step: urgent care, emergency room, or a visit with Hartford’s 24/7 telehealth service or with their primary care provider. If urgent or emergency care is called for, the chatbot will refuse to answer any further questions.
Liability for LLMs
The key to successful implementation of health system-integrated chatbots like PatientGPT and Emmie will be in their approach to testing and monitoring. Researchers have developed several benchmarks for clinical LLM performance, but best practices are still a moving target for real-world health systems.
“Unlike traditional software where there’s a finite number of combinations of things that can happen, and you can just put together a test script and say, ‘we tested everything,’ we needed to take a different approach here,” said Trevor Berceau, director of research and development at Epic. He said the company built an “automated testing suite” for Emmie, and developers have been pushing Emmie to do things it’s not meant to — a practice called red-teaming — to build guardrails around those uses.
“Will things happen? Probably,” said Nadkarni. “But at the same time, if you do this rigorously, you at least have a roadmap of where things might go wrong and you mitigate and you protect around that.” Safety testing should come first, he said, followed by efficacy testing and then randomized controlled trials.
At Hartford, an initial red-teaming analysis published as a preprint found a failure rate of 8.5% in high-risk scenarios. It saw the tool make errors in triaging patients, including under-triaging urgent symptoms, misdirecting escalation of care, and hallucinating clinical context.
In true red-teaming, said Rodman, that failure rate isn’t abnormal — in fact, if a rate is too low, it means “you’re not red-teaming good enough.” But it’s difficult to evaluate what that rate means about the tool’s performance without knowing exactly what “high-risk” scenarios the researchers used, which Hartford’s study did not specify.
“Jumping to full scale is going to impact that 8.5% risk,” said Laurah Turner, associate dean for artificial intelligence and educational informatics at the University of Cincinnati College of Medicine. “Real patients are going to be messier, less predictable, less likely to escalate appropriately on their own.”
In March, the system launched a pilot of 400 real-world conversations that revealed no apparent safety issues, said K Health founder Ran Shaul. “We think we passed the test,” said Shaul. “This is not an incubation, this is ready and working with live patients.”
AI researchers questioned whether 400 cases were enough to capture risk in the wide variety of patient interactions the chatbot would expect to see in broader implementation. Shaul said he doesn’t expect the system will run into zero issues, but he does expect them to be “very limited, and we will pick them up quite quickly.”
Now comes monitoring and follow-up study. When Hartford was piloting PatientGPT, every conversation was monitored by a provider. Now, it will be more limited: A random set of about 20 conversations will get reviewed by providers each day, said Shaul. Every conversation will also be reviewed around the clock by a separate AI agent designed to judge the chatbot’s interactions, and Hartford plans to conduct a study every batch of approximately 1,000 new conversations and report to its AI governance board for every 500 to 1,000 conversations.
Health systems will be exposed to new forms of liability if they choose to implement their own chatbots. “The risks for a health system are much higher” than for a commercial LLM that answers health questions, said Nadkarni. “If you are putting your brand on something like this and something goes wrong in terms of a missed safety signal or an adverse event, then I’m not sure you exist in the same regulatory haze that a lot of direct-to-consumer AI does.”
“What happens when something goes wrong in one of the unreviewed conversations?” asked Turner. “What is the adverse event detection pathway for a patient who was under-triaged and never flagged by a reviewer agent? Is there a patient reporting mechanism?”
Padmanabhan Premkumar, president of Hartford HealthCare Medical Group, acknowledged that applying AI to clinical tasks comes with risk. But risks already exist for patients in today’s health care system, he said: “I think it’s better for them to connect to an ecosystem and a health care system that they trust rather than a random chatbot.”
Next steps and health system shifts
Health systems considering similar chatbot launches are facing a chicken and egg problem, said Turner. “If you don’t deploy it into use, then you actually don’t know where the failure points are. But you also don’t want to deploy it into use until such time as you’re not going to do harm.”
And minimizing patient harm — and hospital liability — is only the first step. Implementing chatbots within the context of existing health systems will impact how patients transition between care, shift the ownership of risk, and impact payment models for care.
If patient-facing chatbots can be an on-ramp for care within a health system, for example, could they also be used to onboard entirely new patients? That’s what Hartford will be testing next, as it plans to soon launch a version of PatientGPT that would allow new patients to establish primary care within its system.
Some digital health care companies, including General Medicine and Amazon, are also using chatbots as an entry point to their products. In January, Amazon launched an AI assistant for its One Medical patients; last month, it expanded availability beyond existing patients, dangling free downstream telehealth appointments as an incentive to try it out.
“I think there’s going to be a lot of frenetic activity,” said Akshay Chaudhari, a Stanford AI researcher who is working on an ARPA-H project evaluating patient chatbots. It’s unclear who will implement them best. “Is it that nimble startup that can actually iterate on user needs? Is it the Epics of the world that just control all the EHR data? Or is it the health systems who can dictate terms to Epic a little bit more than a startup? All three are tractable.”
Digital health care companies may not have direct access to a patient’s electronic health records, but once they establish a provider relationship, they can pull some charts from health information exchanges to build a more contextualized chatbot experience.
Hospitals can complete those data pulls as well, and patient’s medical records will vary in completeness depending on how many of their providers participate in health information exchanges. General Medicine, co-founder Elliot Cohen said, has had good luck, sometimes even surfacing diagnoses from records that patients weren’t aware of. Its AI chatbot can route patients to a visit with one of its telehealth providers, or help them locate in-person lab testing, imaging, or specialist visits.
“I think this is going to converge in both directions,” said Nadkarni. Health systems can use AI chatbots to get new patients and broaden their reach, while providing more entry points — and perhaps more efficient care — for their existing patients.
That is, if it’s not too late to claw those patients back from the confident, friendly patter of their non-clinical chatbot of choice. Those commercial LLMs may yet play a role in leading patients to care.
“Someone could say, ‘Well, it’s a foregone conclusion: We can never own that piece,’” said UTMB’s McCaffrey. “But then if you don’t, what happens to the traditional health care model? I don’t know, honestly.”
