Scientists Go Serious About Large Language Models Mirroring Human Thinking

Author:Murphy | View: 28786 | Time: 2025-03-22 19:18:35

Research combining human brain imaging and psychology with hardcore computer science studies of the LLMs at work

Here I present a set of novel papers, preprints and reviews with research suggesting that, at least for text processing and procedural reasoning, LLMs do work pretty much like the human brain – yet with quite some substantial differences that scientists are now starting to clarify.

Picture generated by DALL-E 3 via ChatGPT, usable for commercial purposes as indicated at https://openai.com/policies/terms-of-use/

Introduction

The emergence of large language models (LLMs) has spurred considerable interest in their potential to mirror the cognitive processes of the human brain. These complex computational systems demonstrate increasingly sophisticated capabilities in language processing, reasoning, and problem-solving, raising the intriguing question of whether they might operate using principles similar to those governing the human mind. I have indeed covered this idea before a couple of times, particularly in the context of the "Chinese room argument" and also in drawing parallels between how LLMs process text and how we humans learn to speak at the same time as we interact with the world and develop reasoning abilities from our daily experiences:

Revolving on the Turing test, the Chinese room argument, and modern Large Language Models

If Oral and Written Communication Made Humans Develop Intelligence… What's Up with Language Models?

Humans as super-advanced "Stochastic Parrots"?

Provocatively, Microsoft Researchers Say They Found "Sparks of Artificial Intelligence" in GPT-4

What if Intelligence and Perhaps Even Consciousness Surface Unhurriedly in AI, Just So Slow That…

I also used this venue to discuss specifically how LLMs might be "reasoning" – and I'm not sure anymore I should use quotation marks here, given how well they perform at several tasks – and the impact that proper prompt craft has on LLMs' abilities to solve problems correctly and to arrive at the right conclusions:

New DeepMind Work Unveils Supreme Prompt Seeds for Language Models

How can language models trained on strings perform math?

Scientists now Rigorously Seeking Parallels Between how LLMs and human brains work

In this new article I present and discuss some very recent papers that explore the potential parallels and distinctions between LLMs and the human brain, examining their performance on cognitive tasks, evaluating methodologies for assessing their abilities, and discussing whether LLMs are truly developing intelligence.

To write up this article I based myself largely on five scientific research articles, some already peer-reviewed and others in preprint form, some presenting totally new results and others serving as reviews of the field, which is of course very new but is however rolling very quickly. Let me first present my five main sources together with a brief summary of each, before I delve into the flesh of my discussions and some provocative thoughts in the rest of the article:

Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences…

This very interesting review, not peer-reviewed for the moment, explores the intersection of LLMs and cognitive science as studied in a handful of recent works. The review details various methods used to evaluate LLMs in comparison to how humans process information, including adaptations of cognitive psychology experiments and the use of neuroimaging data – so you see how it really intersects the hardcore computer science behind LLMs with the analogous lines of research in hardcore biology.

The review contrasts the cognitive abilities of LLMs with those of humans, examining similarities in language processing and sensory judgments, but also highlighting crucial differences in reasoning, particularly with novel problems and functional linguistic competence. Furthermore, the discussion part delves into the potential of LLMs as cognitive models, their applications in diverse cognitive fields, and strategies for mitigating their limitations and biases.

Contextual feature extraction hierarchies converge in large language models and the brain – Nature…

This paper investigates specifically the parallels between LLMs and the human brain's language processing mechanisms. For this, the authors analyze twelve LLMs of similar size but varying performance, assessing their ability to predict the neural responses recorded via intracranial electroencephalograms (EEGs) during speech comprehension. The main finding of this work is that higher-performing LLMs exhibit greater brain similarity, showing a stronger alignment between their hierarchical feature extraction pathways and the brain's, and achieving this alignment with fewer layers. Furthermore, the study highlights the critical role of contextual information, demonstrating that its availability significantly improves both model performance and brain-like processing, particularly in higher-level language areas. The authors even dare to suggest that optimizing LLMs for brain-like hierarchical processing and efficient contextual encoding could be crucial for achieving human-level artificial general intelligence.

Scale matters: Large language models with billions (rather than millions) of parameters better…

This research paper, currently a "reviewed preprint" in eLife, investigates the correlation between the size of LLMs and their ability to predict human brain activity during natural language processing. At the core it is quite similar in spirit to the work presented above, but more focused on the biological systems.

By using electrocorticography, a type of intracranial electroencephalography as used in the paper above too, recorded on epilepsy patients listening to a podcast, the researchers found that larger LLMs, with more parameters and lower perplexity, could more accurately predict neural activity – you see, a finding very similar to the one in the above paper. Furthermore, this work found that the optimal layer for prediction shifted to earlier layers in larger models, and this varied across brain regions, reflecting a language processing hierarchy. The study concludes that scaling up LLMs improves their alignment with human brain activity – just wow! – up to a plateau of performance with the largest models.

Shared computational principles for language processing in humans and deep language models – PubMed

Just read that title carefully: Shared computational principles for language processing in humans and deep language models. Although this article is from 2022, hence when this was all starting, it is extremely revealing. Already then and working with GPT-2, that is with models not as "smart" as those we have now, the group (which overlaps with the group of the above paper in eLife) found empirical evidence that humans and LLMs share three core "computational principles". These are continuous next-word prediction before word onset, using pre-onset predictions to calculate post-onset surprise (prediction error), and representing words using contextual embeddings. These findings provided some of the first clues hinting at LLMs working, at least for text processing tasks, quite similarly to their biological analogs. Moreover, at the time this parallel suggested that LLMs could be plausible computational frameworks for understanding the neural basis of human language processing, and challenging traditional psycholinguistic models – that is messing with psychology and pedagogy.

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

This preprint investigates how LLMs learn to reason, and how their reasoning strategies compare with another task they are good at which is factual knowledge retrieval – that is, asking the LLM to think on a problem vs. simply asking them about some facts they have memorized on training.

The work analyzes the influence of pretraining data on LLM outputs for factual and mathematical reasoning tasks, using a novel technique to rank the influence of millions of pretraining documents.

Their key finding is that reasoning is driven by procedural knowledge, with LLMs synthesizing solutions from documents demonstrating similar reasoning processes rather than directly retrieving answers. Importantly, the answers to reasoning questions rarely appear in the most influential documents, unlike factual questions. This suggests that focusing on high-quality data showcasing diverse reasoning procedures during pretraining could improve the reasoning capabilities of LLMs even further. The study also highlights the significant role of code as pretraining data for enhancing mathematical and problem-solving reasoning, which makes sense because of course source code is just algorithms implemented, and algorithms are accurate and clear recipes to solve problems.

Two strong parallels, and also two big differences, between Human and LLM thinking

Similarity 1: Pathways for Language Processing

My sources all agree in that LLMs exhibit remarkable parallels with human cognitive processes, particularly regarding how they process language. In particular, a key similarity lies in the hierarchical nature of language processing observed in the biological and artificial systems.

Let's go a bit deeper.

LLMs, particularly those based on the transformer architecture, process language through a series of layers, with each layer building upon the representations extracted by previous layers. Similarly, the human brain exhibits a hierarchical organization in its auditory and language-related cortex, progressively extracting increasingly complex linguistic features. When working with artificial systems, researchers track this by checking which artificial neurons activate and how; while in biological systems scientists track this by monitoring brain activity with encephalograms and related techniques. I am personally amazed at how close the investigation techniques and their outcomes are.

Notably, I think that this shared hierarchical structure could be actually highlighting the inherent hierarchical nature of language itself, that builds up from basic phonemes to complex semantic concepts. Further bolstering this notion, two of the studies I consulted have found a strong correlation between an LLM's performance on language tasks and its ability to predict neural responses in the human brain during language processing. Higher-performing LLMs, those exhibiting superior proficiency in reading comprehension and commonsense reasoning, display a greater capacity to predict neural activity in human brains, suggesting that they actually extract features from language in a manner more akin to the human brain – and in turn this possibly emerging from the language's inherent structure.

Similarity 2: The Predictive Power of Context

Various studies found that both in the human brain and in LLMs, contextual information plays a pivotal role in shaping the representations they learn. For example, LLMs with larger context windows can consider a more extensive sequence of preceding text, and it has been clearly shown that larger context windows significantly enhance an LLM's ability to predict human neural responses. Moreover, this was more marked in the brain's language processing areas, such as the inferior frontal gyrus.

This finding mirrors the human brain's reliance on context for comprehending language quite explicitly, I think. Apparently, then, the brain continuously integrates information from prior words and sentences to predict upcoming words and interpret meaning. That is crazy similar to how LLMs work!

This alignment between LLMs and the brain in leveraging contextual information underscores the crucial role that background information (seen upon training or provided in prompts for LLMs, learned from experience and education in humans) plays in facilitating the understanding of language and also thinking in its terms.

Difference 1: Mastering Language doesn't Necessarily Mean Mastering Thought

And honestly, I think this actually applies to both humans and Artificial Intelligence systems, so somehow it could be treated as a similarity too!

But my main point is that despite the two striking similarities I discussed above, there exist some fundamental differences between LLMs and the human brain, particularly in the details of language comprehension and of the very nature of thought.

See, while LLMs demonstrate proficiency in formal linguistic competence, accurately processing grammatical structures, they often exhibit limitations in functional linguistic competence, struggling with the pragmatic and context-dependent aspects of language. I guess however, that you have seen this in some humans too who might be excellent speakers and listeners, possibly avid readers too, but aren't that much skilled at problem solving.

For a concrete example, think how LLMs may excel at generating grammatically correct sentences but struggle to grasp humor, sarcasm, or irony, which heavily rely on contextual cues and social understanding. Humor, sarcasm and irony, also surprise and other weird-feeling tokens, are very difficult to even define, and they certainly include strong elements of thought because they involve the unexpected and/or the absurd, which can only be experienced in the context of some thought-based reference.

I think this discrepancy clearly highlights the challenges LLMs face in fully emulating the human brain's capacity to process language beyond its "surface form".

Difference 2: Memory

Furthermore, while LLMs can capture certain aspects of human memory, such as the primacy and recency effects, these memory mechanisms probably diverge significantly from the biological memory system of the human brain. Human memory is characterized by a dynamic nature, constantly adapting and evolving based on experiences, emotions, and associations. For example you might remember what you were doing when you received the news about a close relative passing away 15 years ago, but you may not remember what exactly you had for breakfast yesterday morning in that all-in hotel where you've already been for 7 days.

LLMs, in contrast, typically rely on fixed representations and lack the flexibility and contextual sensitivity of human memory. In other words, they either know about some previously seen piece of information, or they don't.

Evaluating LLMs as "Cognitive" Models

Evaluating the "cognitive" abilities of LLMs as the papers discussed have done presents of course some unique challenges, because being this all so new, there aren't any established methodologies that can effectively compare the two systems. That's why the papers had to innovate, adopting as we saw a range of approaches inspired by cognitive Science and psychology.

With this same idea, benchmarks with behavioral metrics derived from seven cognitive psychology experiments but adapted to forms that can be used on LLMs have been developed. One such example is CogBench, which beyond being a kind of "psychology laboratory for LLMs", came up to some applied conclusions about how to better prompt LLMs, for example:

CogBench: a large language model walks into a psychology lab

Probably the most surprising approach seen in the papers discussed earlier, to me at least, is that which involves neuroimaging data from human brains to compare the representations learned by LLMs with human brain activity. Such methods offer a direct window into the potential alignment (or differences) between the computational processes of LLMs and the neural mechanisms underlying human cognition. However, of course, interpreting these findings demands caution, given the fundamentally different structure and function of LLMs and the human brain. And of course, the impressive performance of LLMs on certain cognitive tasks does not necessarily equate to a true understanding of human cognitive processes, as these systems may arrive at similar outcomes through vastly different computational pathways.

Are LLMs and Human Intelligence Converging?

The question of whether LLMs are developing true intelligence remains a topic of ongoing debate, and although the papers discussed here shed some light into the broader problem, they are very far from providing an answer.

We can however overview how the researchers working in this field think about it. Proponents of this view point to the impressive performance of LLMs on a growing range of cognitive tasks, arguing that their ability to learn from vast amounts of data and generalize to new situations hints at an emerging form of intelligence. Others remain skeptical, emphasizing the fundamental differences between LLMs and the human brain, particularly in their capacity for reasoning, understanding causal relationships, and interacting with the world in a meaningful way; and they argue that while LLMs may excel at mimicking human language and behavior, they lack the underlying cognitive foundations for true intelligence.

The convergence of LLMs towards brain-like processing systems, as evidenced by their increasing ability to predict neural activity in the papers presented here and their adoption of hierarchical processing mechanisms, raises intriguing possibilities for the future of AI. Perhaps, as LLMs continue to evolve, they will inch closer to a form of intelligence that more closely resembles human cognition yet do it "hyperbolically", that is never getting there. However of course, if they get close enough then the lines between artificial and biological intelligence might be blurred, just like I think it's pretty obvious now when LLMs are challenged accross many domains.

A Short Conclusion?

The topic is as interesting as debated and unripe. And the field is in need of deep research, and even of some definitions – starting by "what is intelligence, exactly?"

What's most interesting to me about is that with investigations like those presented here we not only interrogate this fantastic issue, but also delve into the LLMs themselves, and into brain function itself. This makes all such research extremely important, as even if it doesn't end up deciding on artificial vs. human intelligence, it will certainly produce fruits – from better LLMs that can solve more complex problems, perhaps with less hallucinations and less improper generations; to better understanding of how the brain works and hence how we can learn better, explain, diagnose and treat diseased, and beyond.

Further Reads

A New Method to Detect "Confabulations" Hallucinated by Large Language Models

Direct Uses of Large Language Models in Modern Science and Technology

Powerful Data Analysis and Plotting via Natural Language Requests by Giving LLMs Access to…

Unleashing the power of ChatGPT in the classroom -discussion after months of tests

www.lucianoabriata.com I write about everything that lies in my broad sphere of interests: nature, science, technology, programming, etc. Subscribe to get my new stories by email. To consult about small jobs check my services page here. You can contact me here. You can tip me here.

Tags: Artificial Intelligence Large Language Models Machine Learning Science Thoughts And Theory