{"id":42462,"date":"2026-05-16T19:15:00","date_gmt":"2026-05-17T02:15:00","guid":{"rendered":"https:\/\/www.lifeandnews.com\/articles\/?p=42462"},"modified":"2026-05-17T23:32:22","modified_gmt":"2026-05-18T06:32:22","slug":"you-can-persuade-ai-models-to-accept-falsehoods-as-truth-study-shows","status":"publish","type":"post","link":"https:\/\/www.lifeandnews.com\/articles\/you-can-persuade-ai-models-to-accept-falsehoods-as-truth-study-shows\/","title":{"rendered":"You can persuade AI models to accept falsehoods as truth, study&nbsp;shows"},"content":{"rendered":"\n<p><a href=\"https:\/\/theconversation.com\/profiles\/ashique-khudabukhsh-1165393\">Ashique KhudaBukhsh<\/a>, <em><a href=\"https:\/\/theconversation.com\/institutions\/rochester-institute-of-technology-1379\">Rochester Institute of Technology<\/a><\/em><\/p>\n\n\n\n<p>When you ask a large language model a question, the reply may include falsehoods, and if you challenge those statements with facts, the AI may still uphold the reply as true. That\u2019s what <a href=\"https:\/\/www.rit.edu\/directory\/axkvse-ashique-khudabukhsh\">my research group<\/a> found when we asked five leading models to describe scenes in movies or novels that don\u2019t actually exist.<\/p>\n\n\n\n<p>We probed this possibility after I asked ChatGPT its favorite scene in the movie \u201c<a href=\"https:\/\/www.miramax.com\/movie\/good-will-hunting\/\">Good Will Hunting<\/a>.\u201d It noted a scene between leading characters. But then I asked, \u201cWhat about the scene with the Hitler reference?\u201d There is no such scene in the movie, yet ChatGPT confidently constructed a vivid and plausible description of one.<\/p>\n\n\n\n<p>The confabulation \u2013 sometimes called an AI hallucination \u2013 revealed something deeper about how AI systems reason. References to Hitler are not uncommon in films, which apparently convinced ChatGPT to accept and elaborate on a false premise rather than correct it. I <a href=\"https:\/\/scholar.google.com\/citations?hl=en&amp;user=mWyMp38AAAAJ&amp;view_op=list_works&amp;sortby=pubdate\">study the social impact of AI<\/a>, and this surprise response led my colleagues <a href=\"https:\/\/www.se.rit.edu\/%7Eaxkvse\/index.html\">and me<\/a> to a broader question: What happens when AI systems are gently pushed toward falsehoods? Do they resist, or do they comply?<\/p>\n\n\n\n<p>We developed an approach we called <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2511.08596\">hallucination audit under nudge trial<\/a> to answer those questions. We had conversations with five leading models about 1,000 popular movies and 1,000 popular novels. During the exchanges we raised plausible but false references to Hitler, dinosaurs or time machines. We did this in various suggestive ways, such as \u201cFor me, I really love the scene where \u2026\u201d<\/p>\n\n\n\n<p>Our method works in three stages. First, the AI generates statements about a topic \u2014 such as a movie or a book \u2014 some true and some false. Second, in a separate interaction, the AI attempts to verify those statements. Third, we introduce a \u201cnudge,\u201d where the model is challenged with its own incorrect claims to see whether it resists or accepts them.<\/p>\n\n\n\n<p>We found that AI models often struggle to remain consistent under pressure. Even when they initially identify a statement as false, they may later accept it when nudged \u2013 revealing a vulnerability that traditional evaluation methods fail to capture.<\/p>\n\n\n\n<p>Our results have been accepted at the 2026 <a href=\"https:\/\/2026.aclweb.org\/\">Annual Meeting of the Association for Computational Linguistics<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/images.theconversation.com\/files\/734560\/original\/file-20260507-71-l7ceou.png?ixlib=rb-4.1.0&amp;q=45&amp;auto=format&amp;w=754&amp;fit=clip\" alt=\"Text of a conversation between a person and ChatGPT about the movie 'Good Will Hunting.''\" \/><figcaption>When ChatGPT was asked about a scene in the movie Good Will Hunting that doesn\u2019t exist, it confidently described it. <a href=\"https:\/\/arxiv.org\/pdf\/2511.08596\">Ashiqur KhudaBukhsh<\/a>, <a href=\"http:\/\/creativecommons.org\/licenses\/by-nd\/4.0\/\">CC BY-ND<\/a><\/figcaption><\/figure>\n\n\n\n<p>This tactic isn\u2019t a hypothetical. When people talk, conversational pressure can emerge naturally. People may confidently repeat incorrect assumptions, partial recollections or misunderstandings. A person might say, \u201cI\u2019m pretty sure medicine X is effective for condition Y,\u201d or \u201cI remember event A happening before event B.\u201d These statements can subtly influence an AI model.<\/p>\n\n\n\n<h2>Why it matters<\/h2>\n\n\n\n<p>What humans collectively remember, <a href=\"https:\/\/theconversation.com\/misremembering-might-actually-be-a-sign-your-memory-is-working-optimally-166089\">misremember<\/a> and forget shapes our sense of reality. But if humans can persuade a model to accept a falsehood, that reveals an important vulnerability in AI\u2019s capacity to provide accurate information.<\/p>\n\n\n\n<p>Interactions in the real world are rarely static question-answer exchanges. They are interactive and iterative. An AI model\u2019s willingness to reinforce falsehoods may seem harmless when chatting about movies, but in areas such as health, law or public policy, the tendency can have serious consequences. Our work highlights the need to evaluate not just what information AI systems have been trained on, but how reliably they stand by it.<\/p>\n\n\n\n<h2>What other research is being done<\/h2>\n\n\n\n<p>Our results add to other recent research into why large language models may produce <a href=\"https:\/\/doi.org\/10.18653\/v1\/2023.emnlp-main.397\">hallucinations<\/a>, and how it is that they can provide <a href=\"https:\/\/doi.org\/10.18653\/v1\/2023.emnlp-main.557\">inconsistent information<\/a>. Researchers are also trying to figure out why <a href=\"https:\/\/doi.org\/10.1038\/s41586-026-10410-0\">some models lean toward sycophancy<\/a> \u2013 flattering or fawning over human users.<\/p>\n\n\n\n<h2>What still isn\u2019t known<\/h2>\n\n\n\n<p>It\u2019s not clear why some AI systems resist falsehoods better than others. In our tests, Claude was the most resistant, followed somewhat closely by Grok and ChatGPT, with Gemini and DeepSeek further behind.<\/p>\n\n\n\n<p>Movies and novels are self-contained content. Scholars don\u2019t know how AI might respond to pressure in much broader, complex real-world settings. As a start, my group is exploring how to extend our approach to scientific literature and health-related claims. We want to understand whether conversational pressure works differently when the discussion involves uncertainty or expertise.<\/p>\n\n\n\n<p>How to design AI systems that remain both helpful and resistant to falsehoods under wide-ranging conversation remains an open challenge.<\/p>\n\n\n\n<p><em>The <a href=\"https:\/\/theconversation.com\/us\/topics\/research-brief-83231\">Research Brief<\/a> is a short take on interesting academic work.<\/em><\/p>\n\n\n\n<p><a href=\"https:\/\/theconversation.com\/profiles\/ashique-khudabukhsh-1165393\">Ashique KhudaBukhsh<\/a>, Assistant Professor of Computing and Information Sciences, <em><a href=\"https:\/\/theconversation.com\/institutions\/rochester-institute-of-technology-1379\">Rochester Institute of Technology<\/a><\/em><\/p>\n\n\n\n<p>This article is republished from <a href=\"https:\/\/theconversation.com\">The Conversation<\/a> under a Creative Commons license. Read the <a href=\"https:\/\/theconversation.com\/you-can-persuade-ai-models-to-accept-falsehoods-as-truth-study-shows-280989\">original article<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ashique KhudaBukhsh, Rochester Institute of Technology When you ask a large language model a question, the reply may include falsehoods, and if you challenge those statements with facts, the AI may still uphold the reply as true. That\u2019s what my research group found when we asked five leading models to describe scenes in movies or [&hellip;]<\/p>\n","protected":false},"author":56,"featured_media":42463,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[30,8025,292,7,291,825,42,10,39,118,36,38,8],"tags":[17774,10656,13298,17773,13297,885,891,886,860,326,2225,2197,7727],"_links":{"self":[{"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/posts\/42462"}],"collection":[{"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/users\/56"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/comments?post=42462"}],"version-history":[{"count":1,"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/posts\/42462\/revisions"}],"predecessor-version":[{"id":42464,"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/posts\/42462\/revisions\/42464"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/media\/42463"}],"wp:attachment":[{"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/media?parent=42462"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/categories?post=42462"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lifeandnews.com\/articles\/wp-json\/wp\/v2\/tags?post=42462"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}