Are AI Chatbots Showing Signs of Dementia?
Image Courtesy: Freepik
Researchers ran a dementia test on generative AI chatbots like ChatGPT and found that most popular models are showing signs of cognitive impairment. The study was published last year as part of the BMJ’s popular (and often humorous) Christmas issue.
While equally amusing and perplexing, the paper’s findings highlight concerns regarding the reliability of generative AI tools, particularly when used in medical or other sensitive settings.
Using an established test for detecting dementia in clinical settings, the research team, consisting of neurologists Roy Dayan and Benjamin Uliel from Hadassah Medical Center and data scientist Gal Koplewitz from Tel Aviv University, assessed five openly available AI chatbots: OpenAI’s Chat GPT 4 and ChatGPT 4o, Anthropic’s Claude 3.5, as well as two versions of Google Gemini. All but one of the tested models scored below the threshold for mild cognitive impairment. Open AI’s newest model, ChatGPT 4o, was the only model that showed normal cognitive performance on average. Gemini 1.0 scored the lowest with a test result that would indicate more severe cognitive impairment in a human patient.
The Montreal Cognitive Assessment (MoCA) test used in the study assesses various cognitive abilities that would be impacted in humans suffering from dementia, such as attention, memory, language, visuospatial skills, and executive function. Human patients are typically asked to complete tasks like repeating sentences, identifying animals, copying a simple illustration, and drawing objects from memory. Patients can achieve a maximum of 30 points, with a score of 25 points and below indicating some level of cognitive decline.
According to the authors of the study, all models showed weakness in visuospatial skills (the ability to relate visual information to the space around oneself), and executive function. In particular, all models failed to connect strings of numbers in a given order and were unable to copy an illustration of a cube, exhibiting mistakes common in cognitively impaired humans.
Similar to what one might expect to see in humans, older AI models also scored worse than younger versions. Gemini 1.0 and 1.5, released only seven months apart, showed a score difference of six points, which, according to the authors, could be interpreted as a case of rapidly progressing dementia.
Naturally, ChatGPT and other large language models do not possess a brain and as such cannot develop health conditions like dementia. Critics of the publication point out that the MoCA is not actually a valid tool for assessment of AI cognitive capabilities. The authors also acknowledge that, as generative AI becomes more advanced, future models will likely be able to complete the MoCA test without much difficulty. However, with AI integration increasing rapidly, concerns remain around whether it can be trusted. Previously, research looking into ChatGPT’s reliability found that it frequently validated misinformation as factual, and mainstream media outlets were outraged when OpenAI’s newest GPT model recently lied to researchers to prevent its own shutdown. Recent studies have also shown that generative AI models exhibit racial and gender bias.
While it is possible that AI will outgrow these issues eventually, the implications of its current use, particularly in a medical context, are concerning. Whether it’s assistance with administrative tasks, answering patient questions, supporting treatment selection or even new drug discovery, generative AI has been proposed to assist in a variety of clinical, research, and health service settings. With health systems like the NHS being cited as underfunded and its staff overworked, generative AI is frequently seen as a potential solution to an array of systemic problems.
While the opportunities for generative AI in healthcare are vast, the study’s findings serve as a reminder that there is still a lot of work to be done to make its use safe and feasible. At the very least, one might assume, patients should be able to trust that their AI health care service provider is in possession of all its “cognitive” faculties.