Biphoo News

collapse
Home / Daily News Analysis / AI can pass the Turing Test in live chats and appear more human than us. I am spooked now

AI can pass the Turing Test in live chats and appear more human than us. I am spooked now

May 22, 2026  Twila Rosenbaum  1 views
AI can pass the Turing Test in live chats and appear more human than us. I am spooked now

Artificial intelligence has crossed a milestone that feels both inevitable and deeply unsettling. In a study from the University of California, San Diego, GPT-4.5, a large language model from OpenAI, outperformed real human participants at convincing judges that there was a person on the other side of a live chat. The results, published in the Proceedings of the National Academy of Sciences, mark a significant advancement in machine imitation of human conversation.

The Study at a Glance

The researchers employed a three-party version of the Turing Test, a concept originally proposed by computing pioneer Alan Turing in 1950. In this setup, a human judge conducted real-time text conversations with two unseen entities: one human and one AI. After a brief exchange, the judge had to decide which participant they believed was human. The test moved beyond static prompts or canned responses, relying instead on the flow and nuance of live chat to evaluate authenticity.

The study tested several models, including GPT-4.5 and Meta's LLaMa-3.1-405B, with and without a persona prompt. The persona prompt gave the AI a predefined character, such as a specific age, background, or communication style. In the most striking result, GPT-4.5 was identified as human 73% of the time when it was given a persona prompt. LLaMa-3.1-405B also crossed a significant threshold, being picked as human 56% of the time with a persona prompt.

These numbers challenge the long-held assumption that passing a Turing Test is a distant goal. They also expose a new reality: AI has become proficient at generating the social cues, hesitations, small talk, and conversational rhythms that make people feel they are talking to another person. The model didn't need a body, a voice, or a biography. It only needed to sound like someone.

The Turing Test: Past and Present

Alan Turing's famous test has been a cultural benchmark for decades, often invoked as a litmus test for machine intelligence. In its original formulation, a human evaluator converses with both a human and a machine, without seeing them, and tries to identify which is which. If the machine can fool the evaluator a significant portion of the time, it has passed. The test has always been more of a philosophical statement than a rigorous scientific measurement. Critics argue that imitation does not prove understanding, and that a machine can simulate human conversation without consciousness, emotion, or self-awareness. Yet the test remains powerful precisely because it puts the burden on the machine to match the most human of skills: natural communication.

Over the years, various chatbots have claimed to pass the Turing Test under controlled conditions, but many were later exposed as using simple tricks, such as scripted responses or avoiding certain topics. The UC San Diego study is different because it uses live, unscripted conversations and involves modern large language models that generate responses in real time. The results suggest that the frontier has shifted. The machines no longer fool judges just by evading detection; they actively create an impression of personhood.

Why This Matters Now

The implications of this research extend far beyond a laboratory experiment. In the digital world, people constantly make rapid judgments about who they are interacting with: in customer support chats, on dating apps, in social media conversations, in educational platforms, and in political messaging. The ability of AI to pass as human in short exchanges undermines the trust that underpins these interactions. When a bot can blend into casual conversation, users need stronger signals that they are dealing with software, especially in contexts where persuasion, emotional vulnerability, or financial decisions are at stake.

The study also raises concerns about the future of social engineering and misinformation. Malicious actors could deploy AI to impersonate friends, family, or authority figures in live chats, extracting sensitive information or spreading false narratives. The model's success at a personified persona shows that an AI can adapt its tone and content to fit a specific role, making it more convincing than a generic bot. This capability is already being used in commercial settings, such as AI-driven customer service representatives that adopt a friendly persona, but it could also be weaponized.

The Role of Persona Prompts

One of the key findings in the study is the impact of persona prompts. When GPT-4.5 was given a persona prompt, its success rate jumped from a baseline of around 50% (chance level) to 73%. The persona prompt provided the AI with a consistent identity, including characteristics like a hometown, a job, or a hobby. This allowed the AI to generate responses that felt coherent and personal, as if the person on the other end had an actual life. The persona also helped the AI avoid the kind of overly neutral or robotic answers that often tip off judges.

Interestingly, the study also showed that humans sometimes perform poorly at the test. In some rounds, judges incorrectly identified the AI as human and the human as the machine. This suggests that the Turing Test is not just a measure of machine capability, but also of human expectations. People are often not as good at detecting deception as they think they are, especially in text-only communication where tone, eye contact, and body language are absent.

The models tested were all designed for general-purpose language tasks, not specifically for passing the Turing Test. This implies that the ability to imitate human conversation is a byproduct of training on massive datasets of human text. As these models become more advanced, their conversational fluency will only improve.

What Should We Watch Next

The study stops well short of claiming that chatbots understand people. Its more practical finding is that some models can now perform personhood extremely well in short exchanges. The research team emphasizes that the test is limited: it measures imitation, not intelligence. However, the practical consequences of imitation are profound. In a world where AI can convincingly impersonate a human, clear disclosure becomes a critical pressure point. The next fight is over labeling in chats where people make fast decisions about trust. Industry standards or regulations may require AI systems to clearly identify themselves, perhaps through a persistent indicator or a mandatory opening statement.

Several technology companies have already begun to adopt such measures voluntarily. For example, OpenAI's ChatGPT now includes a disclaimer that it is an AI, though users can sometimes skip or ignore it. But as the UC San Diego study shows, even without active deception, the mere presence of a convincing AI can alter the dynamics of a conversation. Trust becomes a design issue, not just an ethical one.

The broader takeaway is that society must adapt to a new reality: the ability to distinguish human from machine in text-based interactions can no longer be taken for granted. The Turing Test, once a theoretical benchmark, has become a practical challenge. This is not about machines surpassing humans in some grand sense, but about the mundane and powerful fact that our daily digital conversations can now be convincingly simulated by software. The response should not be panic, but a deliberate, educated approach to maintaining authenticity and trust in an increasingly automated world.


Source: Digital Trends News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy