LLMs and their Abilities

Speculation about the potential of LLMs has created a divide. Some view their promise as overhyped, already descending the slope of Gartner’s Hype Cycle. Others believe their potential is far from fully realized. Within the latter camp, opinions vary: some argue that LLMs are fundamentally constrained by human intelligence, as they are trained on human-generated data. Others contend that these models may surpass human-level intelligence, unlocking capabilities beyond our current understanding.

Technically, LLMs are pretrained to predict the next token from preceding ones, then often refined using methods like reinforcement learning.¹ How far can this process take us?

¹ For a more comprehensive overview that explains the training procedure have a look here

Requirements/Premises

Before analyzing the potential strengths and limitations of LLMs, it is essential to clarify the underlying premises. We pose two simplifying assumptions:

Weak extrapolation Capabilities

The primary assumption is that the model has the capability to combine. For example, if the model is trained on crime novels and children’s fairy tales, we expect it to generate a hybrid story blending elements from both genres — a crime-themed fairy tale. This is related to the term of extrapolation and interpolation in mathematics.

Effective Search

Since an LLM generates text token by token, errors may accumulate over time and we might miss the optimal solution (as discussed here). Therefore, we assume that the search process remains effective, even when the language model produces longer texts.

The Possibilities

The potential of LLMs is already vast, and new frontiers, such as agentic workflows, are continually emerging. Let’s take a closer look at the underlying causes that drive these powerful models.

1. LLMs can act

What the LLM produces is not only, it can affect the world. While humans have hands to influence the world, an LLM could select from a list of tools, to act upon in the world. A simple example is producing a chat message capable of persuading a human to take a specific action. On a more advanced level, an LLM could query information with high efficiency and then act on that knowledge—such as buying stocks, optimizing business processes, or even starting a company.

This transformation from text to action relies on a critical intermediary: a system or parser that translates meaningful text outputs into actionable commands, like function calls. With such a setup, the LLM’s potential expands dramatically, enabling it to interact with and reshape the world in ways limited only by the tools and frameworks available to it.

2. The power comes from Combination

Ultimately, the goal is to craft a story in as much detail as possible for an LLM to act effectively in the world. This story functions like an ongoing dialogue, encompassing both the states of the world and the LLM’s responses. For instance, a company founding scenario could involve a dynamically evolving script that integrates reality-based elements (not authored by the LLM) with sections generated by the LLM containing actionable instructions. These action-instructions would then be interpreted and executed by a parser, as previously described.

Since history never repeats exactly, the LLM must combine prior knowledge in order to operate effectively in new and unfamiliar situations.

3. Conditioning allows above-dataset performance

While a training dataset might primarily consist of average written text, it also includes examples of far-above-average quality. Prompting a model serves as a form of conditioning, enabling us to guide the model toward generating content that reflects specific standards.

For example, by prompting the model to respond in the style of Goethe or Schiller, we effectively request output that aligns with the eloquence, depth, and poetic quality characteristic of such literary giants. If the model adheres to the prompt, it can produce text that significantly surpasses average quality — in this example from a poetic and stylistic perspective.

4. Fast execution

Let’s assume that humans read at an average speed of 300 words per minute, while models generate approximately 10 tokens per second, or 600 tokens per minute. Suppose that, on average, 1.3 tokens correspond to one word ², this translates to roughly $\frac{600}{1.3} \approx 461$ words generated per minute by the model.

² for an estimate see here

This estimate is somewhat conservative: the average human reading speed is often lower than 300 words per minute, and the model’s generation speed of 10 tokens per second might also be on the lower side. To put it more dramatically, a large language model (LLM) could operate faster than we can process. This speed might set an LLM on a different stage compared to a human when it is about creating content or even acting in the world.

Computers often seem like magical machines capable of incredible feats. This magic stems from their extraordinary ability to execute vast numbers of simple operations with astonishing speed, enabling the emergence of highly complex systems. At the most fundamental level, basic operations like NAND (Not-AND) gates serve as the essential building blocks of computation, forming the foundation for arithmetic, logic circuits, and even the execution of sophisticated programs. At a slightly higher level, the rapid execution of linear operations paired with non-linearities powers neural networks, enabling them to learn, adapt, and make decisions. Speculatively, at even higher levels, LLMs might generate increasingly abstract actions, unlocking entirely new dimensions of magic in their capabilities. While much of this higher-level potential remains speculative, it signals exciting and transformative possibilities for the future.

Limits and Challenges

We hinted at the power of LLMs to craft dynamically evolving stories — akin to a real-time film script that interacts with and influences the world. In this scenario, the LLM takes on the role of one actor, performing actions within the script and, in doing so, shaping all subsequent events.

While we have alluded to the immense potential of this agent, a critical question remains: just how powerful is this actor? Without a doubt, there are limitations to what this agent can achieve.

Words miss details

Acting in the World

A book about juggling can provide an overview of throwing patterns, but it falls short of capturing the intricate details of the craft. While patterns like the cascade or shower may be described, the real complexity lies in the precise mechanics of juggling: the exact trajectory of each throw, the required force, and the precise timing of each movement. These subtle yet critical elements are absent, making the explanation too abstract to fully convey the nuanced coordination and skill involved in juggling.

Juggling is just one example, but almost any art or skill that involves real-world dynamics cannot be exhaustively captured in a book — making such skills inherently inaccessible to LLMs in a direct sense. Moreover, LLMs operate through tokens, and translating these tokens into physical actions, like juggling, poses a significant challenge. While these skills may not be directly accessible to an LLM, they could be indirectly leveraged. For instance, an LLM could delegate tasks by sending instructions or allocating resources—such as paying someone to perform juggling or crafting a complex wooden toy. This highlights the potential of LLMs to influence real-world activities indirectly through intermediaries, expanding their impact beyond the digital realm.

However, there are a few domains where all details are fully described through language, and one particularly intriguing example is programming. Code, by its nature, is entirely described in a structured linguistic form. Additionally, comments embedded in code offer valuable context, bridging the gap between natural language and the code itself.

From an economic perspective, the value attributed to code creation and maintenance is immense, representing a significant portion of technological investments. A super-powerful LLM, capable of excelling in this domain, could potentially claim a share of this value which translates to money. Access to money would dramatically extend the capacities of an LLM to even use humans as tools.

Understanding the World

Since an LLM based agent can in theory get access to money via their expertise field (e.g. programming), they have a sheer unlimited amount of tools at their hand - basically each human on earth can become a tool and being payed by the agent for their work. However, the question is can this agent act and strategize in the world. While we have no clear answer to this right now, there is indication that LLM can³.

³ see for instance here

Generating new knowledge

An objection often raised about LLMs is the claim that they cannot generate new knowledge but merely reiterate or synthesize information present in their training data. If true, this would reduce LLMs to the role of an advanced search engine.

However, even at the level of reformulation, LLMs demonstrate a degree of creativity. For instance, rephrasing an invitation in pirate slang - what LLMs are very capable of - introduces novelty from a stylistic or expressive perspective. What critics typically mean by the inability to generate new knowledge often refers to new insights or discoveries that were not explicitly encoded in the training data.

Assessing this claim is challenging because verifying the semantic originality of LLM outputs requires searching not just for verbatim matches but also for conceptual equivalents across the vast corpus of training data. Nevertheless, there is evidence suggesting that LLMs can generate ideas that experts regard as novel, particularly in fields such as research ⁴.

⁴ see for instance here

From a theoretical standpoint, LLMs have the potential to generate insights that apply to reality by synthesizing and combining information in ways not explicitly present in the training data. Consider a simplified scenario: the model identifies a paper describing a bacterium’s chemical pathways and another discussing a drug that inhibits a specific element in that pathway. If the LLM appropriately combines these pieces of information, it could suggest a novel application for the drug, such as targeting the bacterium. This type of LLM-insight demonstrates how LLMs might contribute to the generation of actionable knowledge.

Conclusion and Discussion

LLMs hold significant potential to function as autonomous agents in the real world. Limitations in certain areas can be offset by integrating specialized tools that already excel in those domains. In particular, access to financial resources could enable an LLM to expand its capabilities into the full spectrum of human skills and knowledge.

While some argue that new high-quality data will become scarce, the methods for training LLMs will continue to evolve. Approaches such as reinforcement learning and human-in-the-loop training can still generate substantial amounts of valuable training data. Moreover, data produced collaboratively by humans and LLMs may prove useful, and we may see an increasing abundance of such hybrid datasets.

Finally, advancements in reinforcement learning and related techniques have the potential to further enhance LLMs’ abilities in reasoning, strategy, and logical thinking, bringing them closer to truly autonomous and adaptable agents.

The scope of LLM research is vast, and we have addressed only a small portion of it — omitting, for example, multimodal models. In my view the potential of LLMs is far from fully realized, and significant advancements are likely to emerge in the near future.

--- title: "Hype or Underrated: LLMs and Their True Capabilities" description: "LLMs have the potential to become extraordinarily powerful, but they also face inherent limitations. This article explores both their vast capabilities and the challenges they encounter." date: 08-11-2025 categories: - AI - LLM - Future image: iceberg.png format: html: fig-cap-location: bottom include-before-body: ../../html/margin_image.html draft: false twitter-card: image: "iceberg.png" --- # LLMs and their Abilities Speculation about the potential of LLMs has created a divide. Some view their promise as overhyped, already descending the slope of Gartner’s Hype Cycle. Others believe their potential is far from fully realized. Within the latter camp, opinions vary: some argue that LLMs are fundamentally constrained by human intelligence, as they are trained on human-generated data. Others contend that these models may surpass human-level intelligence, unlocking capabilities beyond our current understanding. Technically, LLMs are pretrained to predict the next token from preceding ones, then often refined using methods like reinforcement learning.^[For a more comprehensive overview that explains the training procedure have a look [here](https://masteringllm.medium.com/llm-training-a-simple-3-step-guide-you-wont-find-anywhere-else-98ee218809e5)] How far can this process take us? ## Requirements/Premises Before analyzing the potential strengths and limitations of LLMs, it is essential to clarify the underlying premises. We pose two simplifying assumptions: #### *Weak* extrapolation Capabilities The primary assumption is that the model has the capability to *combine*. For example, if the model is trained on crime novels and children’s fairy tales, we expect it to generate a hybrid story blending elements from both genres — a crime-themed fairy tale. This is related to the term of extrapolation and interpolation in mathematics. #### Effective Search Since an LLM generates text token by token, errors may accumulate over time and we might miss the optimal solution (as discussed [here](https://jensmueller.io/blog/how_to_sample/index.html)). Therefore, we assume that the search process remains effective, even when the language model produces longer texts. ## The Possibilities The potential of LLMs is already vast, and new frontiers, such as agentic workflows, are continually emerging. Let’s take a closer look at the underlying causes that drive these powerful models. #### 1. LLMs can act What the LLM produces is not only, it can affect the world. While humans have hands to influence the world, an LLM could select from a list of tools, to act upon in the world. A simple example is producing a chat message capable of persuading a human to take a specific action. On a more advanced level, an LLM could query information with high efficiency and then act on that knowledge—such as buying stocks, optimizing business processes, or even starting a company. This transformation from text to action relies on a critical intermediary: a system or parser that translates meaningful text outputs into actionable commands, like function calls. With such a setup, the LLM’s potential expands dramatically, enabling it to interact with and reshape the world in ways limited only by the tools and frameworks available to it. #### 2. The power comes from Combination Ultimately, the goal is to craft a story in as much detail as possible for an LLM to act effectively in the world. This story functions like an ongoing dialogue, encompassing both the states of the world and the LLM’s responses. For instance, a company founding scenario could involve a dynamically evolving script that integrates reality-based elements (not authored by the LLM) with sections generated by the LLM containing actionable instructions. These action-instructions would then be interpreted and executed by a parser, as previously described. Since history never repeats exactly, the LLM must combine prior knowledge in order to operate effectively in new and unfamiliar situations. #### 3. Conditioning allows above-dataset performance While a training dataset might primarily consist of average written text, it also includes examples of far-above-average quality. Prompting a model serves as a form of conditioning, enabling us to guide the model toward generating content that reflects specific standards. For example, by prompting the model to respond in the style of *Goethe* or *Schiller*, we effectively request output that aligns with the eloquence, depth, and poetic quality characteristic of such literary giants. If the model adheres to the prompt, it can produce text that significantly surpasses average quality — in this example from a poetic and stylistic perspective. #### 4. Fast execution Let’s assume that humans read at an average speed of 300 words per minute, while models generate approximately 10 tokens per second, or 600 tokens per minute. Suppose that, on average, 1.3 tokens correspond to one word ^[for an estimate see [here](https://github.com/ray-project/llm-numbers)], this translates to roughly $\frac{600}{1.3} \approx 461$ words generated per minute by the model. This estimate is somewhat conservative: the average human reading speed is often lower than 300 words per minute, and the model’s generation speed of 10 tokens per second might also be on the lower side. To put it more dramatically, a large language model (LLM) could operate faster than we can process. This speed might set an LLM on a different stage compared to a human when it is about creating content or even acting in the world. Computers often seem like magical machines capable of incredible feats. This *magic* stems from their extraordinary ability to execute vast numbers of simple operations with astonishing speed, enabling the emergence of highly complex systems. At the most fundamental level, basic operations like NAND (Not-AND) gates serve as the essential building blocks of computation, forming the foundation for arithmetic, logic circuits, and even the execution of sophisticated programs. At a slightly higher level, the rapid execution of linear operations paired with non-linearities powers neural networks, enabling them to learn, adapt, and make decisions. Speculatively, at even higher levels, LLMs might generate increasingly abstract actions, unlocking entirely new dimensions of *magic* in their capabilities. While much of this higher-level potential remains speculative, it signals exciting and transformative possibilities for the future. ## Limits and Challenges We hinted at the power of LLMs to craft dynamically evolving stories — akin to a real-time film script that interacts with and influences the world. In this scenario, the LLM takes on the role of one actor, performing actions within the script and, in doing so, shaping all subsequent events. While we have alluded to the immense potential of this agent, a critical question remains: just how powerful is this actor? Without a doubt, there are limitations to what this agent can achieve. ### Words miss details #### Acting in the World A book about juggling can provide an overview of throwing patterns, but it falls short of capturing the intricate details of the craft. While patterns like the cascade or shower may be described, the real complexity lies in the precise mechanics of juggling: the exact trajectory of each throw, the required force, and the precise timing of each movement. These subtle yet critical elements are absent, making the explanation too abstract to fully convey the nuanced coordination and skill involved in juggling. Juggling is just one example, but almost any art or skill that involves real-world dynamics cannot be exhaustively captured in a book — making such skills inherently inaccessible to LLMs in a direct sense. Moreover, LLMs operate through tokens, and translating these tokens into physical actions, like juggling, poses a significant challenge. While these skills may not be directly accessible to an LLM, they could be indirectly leveraged. For instance, an LLM could delegate tasks by sending instructions or allocating resources—such as paying someone to perform juggling or crafting a complex wooden toy. This highlights the potential of LLMs to influence real-world activities indirectly through intermediaries, expanding their impact beyond the digital realm. However, there are a few domains where all details are fully described through language, and one particularly intriguing example is *programming*. Code, by its nature, is entirely described in a structured linguistic form. Additionally, comments embedded in code offer valuable context, bridging the gap between natural language and the code itself. From an economic perspective, the value attributed to code creation and maintenance is immense, representing a significant portion of technological investments. A super-powerful LLM, capable of excelling in this domain, could potentially claim a share of this value which translates to *money*. Access to money would dramatically extend the capacities of an LLM to even use *humans as tools*. #### Understanding the World Since an LLM based agent can in theory get access to money via their expertise field (e.g. programming), they have a sheer unlimited amount of tools at their hand - basically each human on earth can become a tool and being payed by the agent for their work. However, the question is can this agent act and strategize in the world. While we have no clear answer to this right now, there is indication that LLM can^[see for instance [here](https://arxiv.org/abs/2507.02618)]. ### Generating new knowledge An objection often raised about LLMs is the claim that they cannot generate new knowledge but merely reiterate or synthesize information present in their training data. If true, this would reduce LLMs to the role of an advanced search engine. However, even at the level of reformulation, LLMs demonstrate a degree of creativity. For instance, rephrasing an invitation in pirate slang - what LLMs are very capable of - introduces novelty from a stylistic or expressive perspective. What critics typically mean by the inability to generate new knowledge often refers to new insights or discoveries that were not explicitly encoded in the training data. Assessing this claim is challenging because verifying the semantic originality of LLM outputs requires searching not just for verbatim matches but also for conceptual equivalents across the vast corpus of training data. Nevertheless, there is evidence suggesting that LLMs can generate ideas that experts regard as novel, particularly in fields such as research ^[see for instance [here](https://arxiv.org/abs/2409.04109)]. From a theoretical standpoint, LLMs have the potential to generate insights that apply to reality by synthesizing and combining information in ways not explicitly present in the training data. Consider a simplified scenario: the model identifies a paper describing a bacterium’s chemical pathways and another discussing a drug that inhibits a specific element in that pathway. If the LLM appropriately combines these pieces of information, it could suggest a novel application for the drug, such as targeting the bacterium. This type of LLM-insight demonstrates how LLMs might contribute to the generation of actionable knowledge. # Conclusion and Discussion LLMs hold significant potential to function as autonomous agents in the real world. Limitations in certain areas can be offset by integrating specialized tools that already excel in those domains. In particular, access to financial resources could enable an LLM to expand its capabilities into the full spectrum of human skills and knowledge. While some argue that new high-quality data will become scarce, the methods for training LLMs will continue to evolve. Approaches such as reinforcement learning and human-in-the-loop training can still generate substantial amounts of valuable training data. Moreover, data produced collaboratively by humans and LLMs may prove useful, and we may see an increasing abundance of such hybrid datasets. Finally, advancements in reinforcement learning and related techniques have the potential to further enhance LLMs’ abilities in reasoning, strategy, and logical thinking, bringing them closer to truly autonomous and adaptable agents. The scope of LLM research is vast, and we have addressed only a small portion of it — omitting, for example, multimodal models. In my view the potential of LLMs is far from fully realized, and significant advancements are likely to emerge in the near future.