In the field of health, LLM endeavors to concentrate on clinical duties and obtain a diverse array of data. The promise of LLMs in this field is highlighted by the large, continuous, and longitudinal data supply that mobile and wearable devices offer for personal health monitoring. Personalized health insights from wearable data can be obtained by LLM agents, while non-trivial open-ended analysis presents a hurdle. Scaled study of personal health is made possible by these agents’ ability to think and interact with the outside world. Nevertheless, there is still a lot of unexplored potential for using LLM agents to analyze individual health.

The recent trend in healthcare is the integration of AI into medicine with a particular focus on Large Language Models (LLMs). In a couple of preprints on bioRxiv, Google researchers introduce two complementary approaches to provide accurate personalized health and wellness information using LLMs:

  1. In the first approach, researchers introduce a new model called the Personal Health Large Language Model (PH-LLM), which is a Gemini version optimized for reasoning over numerical time-series personal health data and text interpretation in applications related to sleep and fitness.
  2. In the second approach, researchers present the Personal Health Insights Agent (PHIA). This system analyses and interprets behavioral health data from wearables by utilizing cutting-edge code generation and information retrieval methods.


Wearables and other personal gadgets have a big impact on sleep patterns and physical activity because they track behavior and physiology longitudinally in situ. Positive behavioral changes can be encouraged, and personalized health insights can be revealed by this data. For example, a person’s risk of early death is 37% reduced if their Physical Activity Energy Expansion (PAEE), measured by the device, is 5 kJ/kg/day. Frequent sleep disruptions increase the risk of cardiovascular disease, diabetes, and hypertension. It has been discovered that using an activity tracker increases physical activity and aids in weight loss.

Numerous advantages come with wearable technology, such as better sleep and overall health. However, the lack of clinical supervision and experience makes using wearable data for health questions difficult. For example, consumers might inquire about how to get better sleep, which involves intricate analytical procedures over several time series. These procedures entail determining which measures to optimize, figuring out average sleep metrics, spotting abnormalities, placing discoveries in the context of the patient’s health, including population norms, and providing customized recommendations. In addition to numerical analysis, this also entails assessing healthy sleep in light of a person’s health profile.

LLMs in Medicine

Large Language models, or LLMs, are flexible instruments that are excellent at producing language in a variety of contexts. They have proven to be highly proficient in psychiatric functional assessment, medical question-answering, subtle analysis of electronic health information, differential diagnosis from medical pictures, and psychological intervention delivery. These resources can potentially impact medical research, teaching, and clinical practice—especially in the area of natural language interfaces. Their excellent performance shows that they can efficiently extract signals from clinical data gathered in a clinical context. 

The irregular schedule of clinical appointments frequently results in neglecting important facets of human health, including stress, physical activity, sleep, and cardiometabolic health, which can be assessed by behavior and physiological reactions. Although these continuous, longitudinal measurements have many benefits for tracking health, neither clinical practice nor standard databases have fully included them. This is because of the interpretive challenges, computational demands, and lack of context. This emphasizes the significance of continuous, longitudinal measures in health monitoring. As a result, general foundation LLMs or medically-tuned LLMs may find it difficult to use these data to effectively prescribe therapies based on personalized individual health behaviors.

In the realm of health, large language models (LLMs) have demonstrated promise in producing language for intricate tasks like answering medical questions, providing medical education, analyzing electronic health records, providing mental health treatments, and deciphering medical images and evaluations. To improve these models’ capabilities, more software tools, such as code generation and information retrieval, can be added. These techniques provide a major opportunity to extract insights from personal health data, including data produced from wearables, by enabling LLM-based robots to reason about and interact with the world. An agent could be really helpful to people and the population’s overall health if it can autonomously break down difficult jobs, reason using both internal and external data, and produce safe, useful insights.

PHIA and PH-LLM: What is it?

The first LLM agent for producing personal health insights is the Personal Health Insights Agent (PHIA), which uses online search integration, enhanced code creation, and the ReAct agent framework for iterative reasoning. The superior reasoning skills of LLM agents in interpreting deep health insights and time-series behavior health data are demonstrated by this framework, which is built to answer thousands of real-world health queries. Along with comparing it to non-agent LLM-based code generation and text-only numerical reasoning methodologies, the paper also presents a 650-hour human evaluation of over 6000 model replies. More than 4000 closed- and open-ended questions from various fields are also made available as a dataset for examination by humans and computers.

An instrument for enhancing personal health behaviors linked to fitness and sleep patterns is the Personal Health Large Language Model (PH-LLM). It’s an enhanced Gemini that’s meant to give users advice and insights specific to them. Two areas of great personal health interest—fitness and sleep—are the focus of the PH-LLM. To gather information and offer tailored suggestions for raising the quality of sleep, it analyses each user’s unique sleep data. The fitness tasks provide suggestions for the intensity of physical activity based on a combination of training load, sleep, health data, and subjective input. Three tasks comprise the evaluation of the PH-LLM’s performance: guessing subjective patient-reported outcomes, coaching recommendations, and multiple-choice tests.


PHIA is a new platform for personal health insights powered by LLM agents that gives consumers the ability to make decisions based on their health data. It uses search techniques and code development to iteratively reply to inquiries and interact with wearable data. Human examination of the framework has shown that it performs better than baseline LLM-based techniques. Reduced sleep duration is associated with 7 of the top 15 causes of premature death in the United States and 9% of premature mortality globally. Sleep and fitness are critical for population health. PHIA shows how language model agents can be included in daily life to help people make informed decisions based on their health data, which could transform healthcare through improved communication.

A refined form of Gemini called PH-LLM has been created to enhance fitness and health results. It incorporates objective data from wearables into tailored insights, suggesting ways to enhance sleep hygiene and fitness as well as possible reasons for observed behaviors. The utilization of domain knowledge and user information personalization for sleep insights has been greatly enhanced by the Gemini Ultra 1.0 model, which shows aggregate performance comparable to fitness experts. In addition, PH-LLM has the potential to improve fitness and health since it can predict subjective sleep outcomes using a multimodal encoder.

Article Sources: Research Paper 1 | Research Paper 2 | Reference Article

Important Note: bioRxiv releases preprints that have not yet undergone peer review. As a result, it is important to note that these papers should not be considered conclusive evidence, nor should they be used to direct clinical practice or influence health-related behavior. It is also important to understand that the information presented in these papers is not yet considered established or confirmed.

Learn More:


Please enter your comment!
Please enter your name here