AI becoming autonomous and stubborn

Untangling two fundamental dynamics of AI progress

Mar 21, 2025

🚨I’ve slightly revised my view on the potential dynamics at play here, and the distribution of possible future paths, after watching/reading: Ege Erdil & Tamay Besiroglu on Dwarkesh Patel, AI as Normal Technology, The case for AGI by 2030 I currently give a bit higher credence to AI hitting scaling limits around 2030, and if so, diffusion of AI technologies may be a bit more gradual or softer. That said, I still think the exploration below feels like one of the major possible paths forward and are still worth considering. 🚨

Context: There are two key dynamics in AI development that many researchers have explored and considered, but that don’t seem to fully reach other audiences. I think understanding these is crucial to grasping where AI is headed and what that means for the world. I remember seeing this topic differently, and then something finally shifted after reading this article. This is my attempt to explain these ideas in a way that feels intuitive to people across different backgrounds, levels of technical expertise, and perspectives on AI. My hope is that more people will start thinking about these dynamics and recognize the work needed to steer AI toward a safer, more coordinated future. I’m fairly certain this is the general direction we’re headed, but there are still assumptions and uncertainties. I discuss these more detail in the Epistemic Status and Antithesis page.

So to wrap it in one paragraph… People are gonna build AIs that are agentic—ai that will run autonomously and exhibit goal-directed, purposeful behavior. We are after this simply because it free up human time and attention. Well… this work is very much underway examples: AI Digests lets you use an agent to demo buying items from a website / Anthropic is enabling agents to interact with computers directly / Open AI’s Operator agent can use its own browser to perform tasks for you / Open AI’s Deep Research completes multi-step research tasks / Hypothetical Minds research on developing agent's theory of mind to collaborate and compete with other agents / Metta AI training agents to care in socially complex environments / METR research on when AI systems will be able to independently complete long-horizon projects / Manus is building an AI agent that “bridges minds and actions”. At the same time, AIs will exhibit what I call a stubborn property. Through trial and error, these agents will try to overcome obstacles and continuously attempt to expand the set of tasks they can autonomously fulfill. We want this because the most profit and value is in solving automation in ever new areas—think moving beyond AI helping with writing emails to having AI systems that autonomously run businesses.

As tasks become more complex, they increasingly require engagement with aspects of reality not yet integrated into digital processes, forcing AI to bridge gaps where no digital infrastructure or pipelines exist. To expand into these complex areas, AIs must actively figure out how to accomplish tasks despite various constraints, stubbornly trying different approaches until they succeed. As Nate Soares writes:

Because the way to achieve long-horizon targets in a large, unobserved, surprising world that keeps throwing wrenches into one's plans, is probably to become a robust generalist wrench-remover that keeps stubbornly reorienting towards some particular target no matter what wrench reality throws into its plans … so you've built a generalized obstacle-surmounting engine. You've built a thing that excels at noticing when a wrench has been thrown in its plans, and at understanding the wrench, and at removing the wrench or finding some other way to proceed with its plans. – Nate Soares, Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

If AIs are autonomous, bring value, and the prices of them will be similar as today there will be an enormous number of AI agents, perhaps an order of magnitude more than humans today. What can be consequences of large number of AIs trying to autonomously expand the automation frontier, trying to bridge gaps of reality not yet automated?

This doesn’t necessarily imply subjective experience or will—that’s a different and more speculative discussion. However, in practice, it may not matter much since these systems will behave as if they are self-directed. This is because there will be a vast number of autonomous AI agents run by different actors and algorithms, each motivated by different things, with conflicting goals and competing for limited resources. Their sheer scale will make monitoring and control difficult, and their owners may be unwilling to deactivate them or fail to recognize when doing so is necessary.

At this scale, we won't be noticing AIs working as intended—instead we'll be seeing one’s that are stubborn — constantly finding and exploiting niches, figuring out how to automate more effectively. The automation of any field will involve a lot of trial and error, with systems persistently working toward their goals. Given the sheer number of attempts, some will push forward even when their actions cause harm, create vulnerabilities, or exploit systems.

Crucially, by default, artificial systems are not attuned to all human values, preferences, and not understanding of different ecosystems and interdependencies. This remains true even if the majority of actors have good intentions. First it’s an ambiguous task because human needs, values, and preferences change over time and some conflict across different countries, cultures, companies and social groups. Second, there is this tricky dynamic. An AI's primary goal—pursuing its core objective—is much simpler than considering its broader impacts. Think of the primary goal as a single point in space evolving as AI explores through trial and error, searching for strategies to reach it. In contrast, understanding how AIs actions influence reality is the entire space around it. It’s a much more complex thing to understand and control as the AI's actions (be it running a businesses, persuading people to buy or do something) go through our established realities, ecosystems, cultures, organisms, and interconnected world with all its dependencies.

There is also a crucial asymmetry — it's a lot harder to create something constructive than destructive. In the history of technology first was dynamite then was a combustion engine. First was the atomic bomb, then was a nuclear electricity plant. In order to make something constructive one need to make it safe, control many moving parts, sync a variety of processes together. This is especially concerning when AI can be used in vulnerable fields such as: synthetic biology, social persuasion, weapons development, financial systems, or cybersecurity. If we scale autonomous technology and broaden its influence, we also increase the chances that some agent will do something both harmful and highly impactful.

There will certainly be countermeasures, control systems, and regulations put in place. Most creators will likely make their best efforts—or be incentivized—to implement guardrails and safety protocols. However, the fragility and complexity of our systems, combined with the vast space of possibilities AI will operate in, will make maintaining a stability challenging.

All of this may radically transform the world as we know it. It could take many forms and manifest in various ways, but here are some speculative ideas illustrating what it may look like:

Widespread automated phone calls and messaging systems
Fully autonomous companies and businesses. Eventually, launching an app or startup becomes so effortless that VC firms simply convert capital into countless ventures rapidly chasing new opportunities and niche markets.
Advanced authentication systems verifying not just human identities but also AI agent provenances, permissions, and behavioral histories
Rebuilt internet infrastructure optimized for speed of AI operations. New protocols designed specifically for AI-to-AI communication
New regulations may establish guardrails by requiring unique identifiers for AIs, with other AIs monitoring and enforcing compliance. Legislation could specify permitted and prohibited activities, especially limiting AI access to sensitive domains such as synthetic biology, social manipulation, nuclear technology, and financial markets.
Professional relationships increasingly managed by AI agents handling end-to-end communication, scheduling, resource allocation, team coordination, and negotiations—while humans focus on setting high-level preferences and boundaries. Over time, human roles shift toward AI oversight and moderation.
Cities developing specialized infrastructure for AI operations—including networks of sensors, automated maintenance systems, and spaces designed for human-AI interaction
AI agents interfacing with physical reality through robotics, sensors, and IoT devices—operating autonomous vehicles, drones, carriers, boats, drillers.
Sophisticated containment and control systems—digital equivalents of airlocks and quarantine zones may be needed for testing and deploying new AI capabilities. Critical infrastructure might need to operate on isolated networks with strictly controlled AI access. Specialized "AI safety zones" could be established where experimental or potentially risky AI systems operate under careful monitoring.
Automated governance systems - AI agents designed to oversee and regulate other AI systems, providing real-time monitoring and mediating between human needs and AI operations.
Meta layer – the emergence of AI systems specifically designed to help humans understand and navigate this new world. These could serve as interface layers that translate between human intentions and complex AI activities, helping humans maintain control and understanding of what's being created and what's happening.

The rise of AI systems that are “stubborn” and autonomous may require rethinking of our institutions, coordination mechanisms, incentive structures, and risk assessment methods. This could be a critical inflection point—where our ability to design effective coordination and alignment frameworks determines whether AI enhances humanity’s capabilities or destabilizes the systems we rely on. The outcome may also depend on how well we model this future and share knowledge across disciplines and perspectives.

I’d appreciate any insights on potential blind spots, differing perspectives, or ways this projection might be inaccurate

Pawel's substack

Discussion about this post