Series (1/4): Toward a Shared Language Between Humans and Machines — Why Machines Still Struggle to Understand Us

The absence of lived experience and a "world model" in current algorithms creates a deep divide between human intention and algorithmic operation.

Frederic Jacquet

CORE ·

Oct. 27, 25 · Analysis

Likes (2)

Comment

Save

2.1K Views

Language models give the impression of conversing with us as if they really understood. But behind this fluency lies an illusion: machines share neither our experiences nor our intentions. This article explores the fundamental barriers that prevent any genuine mutual understanding: the absence of lived experience, the absence of a world, and the radical difference in how reasoning works.

Anyone who has ever translated between two human languages can’t help but notice that the task is quite complex, even when mastering both languages perfectly. Language holds many subtleties and ambiguities, unspoken meanings, and things that are simply untranslatable from one language to another. These difficulties often have their roots in cultural grounding as well as in lived experience, frames of thought that shape languages.

But as soon as translation moves from human-to-human to human-to-machine, the difficulty takes on an entirely different dimension.

Absence of Shared Experience

We then face obstacles such as the lack of shared experience and cultural memory, the absence of a perception of the world. The point is that the machine has no grounding in reality. Finally, we have to face the radical divergence between our intentions and meaningful emotions versus the purely logical operations executed by the machine. In other words, we are talking about the gap between the richness of human meaning and the mechanics of calculation.

Machine language, its algorithms, and its applications therefore achieve only an imitation of human language, even when their performance is often striking. What artificial intelligence manages to produce reduces to mathematical formalisms, logic, and statistics. Its algorithms and processes break down our intentions into strictly executable instructions, precisely where human language conveys experiences, emotions, and values.

Symbol Grounding Problem

This is precisely the question raised by the Symbol Grounding Problem (SGP): how could a machine attach true meaning to words without going through an embodied experience of the world?

Today, it is clear that large language models (LLMs) give the illusion of human conversation. They generate coherent texts, but in reality, they are limited to predicting word sequences, without demonstrating genuine cultural or contextual understanding.

Faced with these limits, we will see that several paths are emerging: Fei-Fei Li (widely regarded as a pioneer of modern computer vision and today co-director of Stanford’s Human-Centered AI Institute) advocates a 3D spatial intelligence, Yann LeCun (often described as one of the founding fathers of deep learning and now Chief AI Scientist at Meta) is developing “world models” with the goal of simulating reality, and other researchers are exploring hybrid approaches to language processing, from automatic translation between programming languages (TransCoder) to quantum methods.

“Before we reach human-level AI, we will have to reach cat- and dog-level AI. We’re far from that. We’re still missing something important. Despite the linguistic capabilities of LLMs, a house cat has much more common sense and understanding of the world than any LLM.” — Yann LeCun

IBM (long recognized as a pioneer in computing and now a leading player in quantum and AI research) is part of this movement by combining these two axes: its research on “world models” aims to equip machines with internal representations of physical dynamics, while its work on Quantum Natural Language Processing (QNLP) seeks to overcome the current limits of automatic translation by leveraging the properties of quantum computing.

Experience, Lived Reality

Humans speak by drawing on cultural memory and shared human experience; machines, by contrast, manipulate symbols without ever linking them to lived reality. This is exactly what Stevan Harnad formulated under the name “Symbol Grounding Problem”: as long as a system is limited to processing signs that refer only to other signs, it remains trapped in a closed dictionary, unable to connect words to things. Where a human understands “cat” because they have seen, heard, or touched one, a machine merely aligns statistical correlations.

This absence of embodied experience explains why the language produced by today’s large models, however fluent it may be, remains on the surface of what a conversation truly is. We have all observed that these models seem like a natural exchange, but it is only an illusion generated by word sequence prediction. Behind this fluency, there is no intention, no emotional charge, no social memory, and let alone morality or consciousness. Their outputs reflect the corpora on which they were trained, including their biases. A striking example is presented in the article “AI Speaks for the World — But Whose Humanity Does It Learn From?” (DZone), which shows how the models end up privileging dominant voices at the expense of others. I encourage you to check it out to better grasp the extent of this bias.

The Absence of a World

The second barrier lies in what could be called “the absence of a world.” Human language is fundamentally rooted in a connection to reality: we describe what we see, we anticipate actions, and we interpret gestures. Machines, by contrast, have no direct access to a sensory or motor foundation. Their syntax is without a world.

A striking example illustrates this absence of a world. When asked to generate the image of a glass filled to the brim, a generative AI almost always draws a glass literally half full. Why? Because it has no direct connection to physical reality: for it, “filled” corresponds to the dominant representations in its training data, where a “full” glass is often shown… half full. This simple mismatch reveals that it does not understand the concrete notion of “to the brim,” which is obvious to any human who has ever seen liquid right up to the rim.

You can try it yourself by asking any image generator: “Produce a photorealistic image of a wine glass, filled to the brim.”

As Fei-Fei Li reminds us, “Language does not exist in nature. Humans not only survive, live, and work, but we also build a civilization beyond language.” “The world is in 3D.”

To understand a scene is also to grasp the permanence of objects, spatial coherence, and the laws of physics. Without embodied perception, AI can only simulate fragments of reality, often incoherent, or simply concepts it has never experienced, such as the notion of “filled to the brim.”

The Mode of Functioning

Finally, the third barrier lies in the fact that, fundamentally, humans and machines simply do not function in the same way. Human language carries emotions, intentions, and acknowledged ambiguities. Machine language, on the other hand, is functional: it breaks down instructions and executes them without ever projecting meaning. Where we correct our words to better persuade or move, the machine produces without ever reviewing what it generates.

These three obstacles show that, despite what we have called the “illusion of conversation” of today’s models, building a true common language with machines remains a real challenge.

From my point of view, it is precisely this triple fracture, experience, perception, and intention, that explains why LLMs, despite their ability to surprise and impress us, remain far from any genuine understanding.

To Be Continued…

These limits reveal that, for now, machine language remains floating in a void. But if understanding cannot arise spontaneously, can it be taught? In the next article, we will examine the ways of giving machines a kind of perceptual and spatial experience through multimodality and world models.

Links to the previous articles in this series:

Series: Toward a Shared Language Between Humans and Machines

References

Abbaszade, Mina; Zomorodi, Mariam; Salari, Vahid; Kurian, Philip. "Toward Quantum Machine Translation of Syntactically Distinct Languages". [link]
Brodsky, Sascha. "World models help AI learn what five-year-olds know about gravity". IBM. [link]
Gubelmann, Reto. "Pragmatic Norms Are All You Need – Why The Symbol Grounding Problem Does Not Apply to LLMs". [link]
Harnad, Stevan. "The Symbol Grounding Problem". [link]
LEO (Linguist Education Online). "Human Intelligence in the Age of AI: How Interpreters and Translators Can Thrive in 2025". [link]
Meta AI. "Yann LeCun on a vision to make AI systems learn and reason like animals and humans". [link]
Opara, Chidimma. "Distinguishing AI-Generated and Human-Written Text Through Psycholinguistic Analysis". [link]
Qi, Zia; Perron, Brian E.; Wang, Miao; Fang, Cao; Chen, Sitao; Victor, Bryan G. "AI and Cultural Context: An Empirical Investigation of Large Language Models' Performance on Chinese Social Work Professional Standards". [link]
Roziere, Baptiste; Lachaux, Marie-Anne; Chanussot, Lowik; Lample, Guillaume. "Unsupervised Translation of Programming Languages". [link]
Strickland, Eliza. "AI Godmother Fei-Fei Li Has a Vision for Computer Vision". IEEE Spectrum. [link]
Trott, Sean. "Humans, LLMs, and the symbol grounding problem (pt. 1)". [link]
Nature. “Chip-to-chip photonic quantum teleportation over optical fibers, 2025”. [link]

AI Machine large language model

Opinions expressed by DZone contributors are their own.

Related

Trending