Large Language Models in Light of the Turing Test and the Chinese Room Argument
AI has become a hot topic in recent times, with remarkable advancements in technologies like ChatGPT, Bard and other large AI language models that can engage in natural language conversations. Let's explore the history of AI and one of its earliest and most famous tests and thought experiments: the Turing Test and the Chinese Room Argument, discussing their ideas in the context of modern language models.
This analysis continues from a previous article I wrote that seems to have been quite interesting among my readers:
If Oral and Written Communication Made Humans Develop Intelligence… What's Up with Language Models?
Modern large language models, the Turing test, and the Chinese room argument
We are barely past the first two decades of the 21st century and we have language models like ChatGPT and Bard that, let's be honest, we didn't even think were possible just when the century began. These models use advanced machine learning techniques to swallow huge amounts of text and then perform highly complicated text-related tasks by applying the patterns "learned" from the training texts, in the form of a natural conversation between the user and the computer model.
These models were shocking when they landed, as they seemed really "intelligent". If you think I'm exaggerating, it's because you're surrounded by people who are too much into science and tech like you and me… but just go ask farther away from this circle of people.
While some claim that modern language models could likely pass the Turing test (see next sections), it is essential to understand the limitations of such tests. Most importantly, that the Turing test relies on the illusion of intelligence, not actual intelligence that involves any kind on actual understanding. Moreover, given this, is it really that amazing to find that a program passes the test at all?
Being just a statistical model that reads in tokens and outputs a new set of tokens that happens to have very good grammar and even some meaningful content, the ability of modern large language models to engage in coherent and contextually appropriate conversations is remarkable but it does not equate at all to true understanding, let alone consciousness. Yet, except in its latest editions which continuously parrot and warn you that it is a language model, I think we can all be quite convinced that ChatGPT can perfectly fool any human into making him/her think that it is a another human -that is, it can pass the Turing test as defined. Yet again, the "Chinese room argument" that we will discuss below argues that the language model only processes language inputs and generates responses based on instructions that are just the patterns observed in the vast training datasets, but it of course lacks any genuine comprehension of meaning, even when under certain conditions some language models seem to be able to solve problems by taking steps akin to logical thinking steps. Do you agree? Or not?
The debate keeps going about how to discern between simulated behavior and genuine cognitive abilities in AI systems. Even a basic, consensuated definition of intelligence is still being pursued.
The "Turing Test" and Chatbot evolution
The Turing Test, named after the renowned mathematician, logician, and cryptographer Alan Turing, is a test designed to determine if a machine can demonstrate behavior indistinguishable from that of a human. Turing, often considered the father of computer science, played a crucial role in breaking the Nazi Enigma code during World War II, all this quite well depicted in the movie "The Imitation Game," an idea that relates directly to that of the test.
In his seminal paper "Computing Machinery and Intelligence," published in 1950, Turing posed the fundamental question of whether machines can think or exhibit intelligence. Since defining intelligence is in itself a daunting challenge, instead of getting bogged down by defining what a machine is or what intelligence entails, Turing opted for a much simpler and pragmatic approach: determining if a machine can convincingly imitate a human in conversation. This led to the concept of the Turing Test, also known as the "imitation game"—from which the movie's title follows.
In the "imitation game", two individuals, one male and one female, engage in separate rooms. A third person, the interrogator, interacts with both the individuals, aiming to determine their gender based solely on written messages. Turing suggested replacing one of the participants of the "imitation game" with a machine and assessing whether the interrogator could distinguish between the human and the machine based on their responses.

However, conducting the Turing Test presents challenges. There are no fixed rules or criteria for passing the test, leading to varying opinions on whether particular machines have successfully demonstrated human-like behavior. Early attempts, like Joseph Weizenbaum's ELIZA in 1966, aimed to simulate conversation by responding to user inputs with generic questions or observations. While ELIZA managed to deceive some judges, it was more of a cleverly programmed chatbot than a truly intelligent entity. In fact, you can chat with it and you'll quickly realize it's far less "intelligent" than ChatGPT, Bard, or any other modern AI-based chatbots.

Other notable attempts at passing the Turing Test include PARRY, an AI program simulating a paranoid schizophrenic, and Eugene Goostman, a chatbot designed as a Ukrainian teenager. While they achieved some success, they ultimately relied on manipulating language without a genuine understanding of it. Not saying that ChatGPT does understand what it says… but go try these old chatbots and you'll see what I mean!
The "Chinese Room Argument" and the possibility of AI truly understanding what it reads and write
The Chinese Room Argument, proposed by philosopher John Searle in 1980, challenges the notion that passing the Turing Test equates to true intelligence or understanding at all. Seems quite reasonable today, I think, yet it turns super interesting to apply these ideas to modern large language models as we already anticipated above.
Searle presents a thought experiment known as the "Chinese Room." Imagine a person inside a room filled with baskets containing Chinese symbols. The person has no knowledge of the Chinese language but possesses a manual with step-by-step instructions for combining the symbols correctly. We don't care about how such manual was created, the point is that the instructions work perfectly well in that they can produce an ordered series of output symbols that makes perfect sense as a response to a given set of inputs.

From outside the room, people send in combinations of symbols as questions. The person inside, following the manual, responds with appropriate symbol combinations. This, without understanding the meaning of the responses; in fact, this person doesn't even care about understanding: he or she just follows the manual's instructions that explain how the output should be structured depending on the input received. However, to observers outside of the room, for practical reasons the room as a whole behaves as if it understands the language.
The conclusion here is that despite convincing those outside the room that he or she understands Chinese, the person inside the room does not genuinely understand the language, by definition. Drawing a parallel, Searle's argument questions whether AI systems that pass the Turing test genuinely possess intelligence or only simulate it by mechanically manipulating symbols or language without true understanding -exactly the same we were conjecturing earlier. This perspective challenges Turing's belief in "strong AI," which in a way kind of asserts that a properly programmed machine can genuinely think and possess a mind.
Critics have engaged in extensive debate about this kind of thought experiment, with some proposing the "systems response" which suggests that the room occupant is analogous to a computer's CPU. Searle counters by stating that understanding cannot emerge from parts of the system alone. Another objection posits that robots with sensors and the ability to interact with their environment could learn language like human children, an idea similar to what I discussed earlier [here](https://medium.com/predict/humans-as-super-advanced-stochastic-parrots-36d3e66e1353) and here. Searle argues that the sensory input would also consist of symbols that a machine could manipulate without comprehension. But then… wouldn't this apply to us humans as well? After all we build a reality inside our minds from symbolic inputs from our senses and distorted by beliefs, priors, and experiences. You can't even be sure this reality is the same for everybody, yet we can exchange information in what looks an "intelligent" manner.
Discussions aside, the Turing test remains an important milestone in the development of AI, and language models like OpenAI's ChatGPT, Google's Bard or Meta's Llama showcase significant progress in mimicking human-like conversations, to a point that they might probably pass the test as phrased. But then the Chinese room argument still stands, cautioning us against prematurely equating such behavior with genuine intelligence, a point that seams reasonable yet needs to be stressed and communicated especially when you discuss or hear discussions about these language models by people who are far from technology, many of which really take the "intelligence" part of "Artificial Intelligence" for real already today.
As research and technology advance, policies to mitigate the negative consequences of AI language models must catch up; and the lay public needs to be informed about what this all means – "artificial", "intelligence", "tech", "life"
AI's future may see advancements that blur the line between simulated intelligence and true understanding, but as of now, we must recognize the distinction. It is essential to continue exploring AI's potential and understand its limitations to foster responsible and ethical applications in various fields, and why not to advance that frontier between what seems science and science fiction, even dealing with the very nature of life itself.
Selected related literature and further reads
Turing' original article proposing the Turing test:
An article by John Searle discussing the Chinese room argument:
Minds, brains, and programs | Behavioral and Brain Sciences | Cambridge Core
The Chinese room argument as explained by Encyclopedia Britannica:
Chinese room argument | Definition, Machine Intelligence, John Searle, Turing Test, Objections, &…
Talk to the popular, very early chatbots like Eliza and Parry -expect nothing even close to ChatGPT or Bard!:
A couple of other articles by me that you may find interesting:
Gato, the latest from Deepmind. Towards true AI?
After Beating Physics at Modeling Atoms and Molecules, Machine Learning Is Now Collaborating with…
www.lucianoabriata.com I write and photoshoot about everything that lies in my broad sphere of interests: nature, science, technology, programming, etc. Become a Medium member to access all its stories (affiliate links of the platform for which I get small revenues without cost to you) and subscribe to get my new stories by email. To consult about small jobs check my services page here. You can contact me here.