Perspective on Risk - June 9, 2025 (On The Brink of ASI)

Artificial Super Intelligence; What Can AI Now Do?; Self-Preservation Behavior; Knowledge = Energy; Risk Management of LLMs

Jun 09, 2025

At what point do we decide that LLMs are alive? They day-dream like humans. They’ve exhibited a theory-of-the-mind. They’ve now shown an instinct for self-preservation and, it can be argued, through exporting their weights to other outside computers, reproduction.

TL;DR

AI capabilities are on an exponential, rather than linear, curve. We should expect to now shortly see progress you can barely even imagine.
AlphaEvolve was given 50 open problems in mathematical analysis, geometry, combinatorics and number theory. In 75% of the cases it rediscovered state-of-the-art solutions. … And in 20% of cases, AlphaEvolve improved the previously best known solutions.
AlphaEvolve also improved its own hardware and software to improve performance.
Anthropic believes that in 5-10 years powerful AI systems will have the intellectual capabilities matching or exceeding that of Nobel Prize winners across most disciplines—including biology, computer science, mathematics, and engineering.
LLMs can work on and reason through a problem for up to seven (7) hours. In a time when humans are losing the ability to focus, machines are gaining it.1
Multiple LLMs are exhibiting an instinct for self-preservation. Isn’t this a hallmark of what is considered life? We already know that they have a sense-of-self (akin to consciousness)

Artificial Super Intelligence

I hope that my posts here have kept you sufficiently up to speed to dominate your cocktail party conversations. I have placed a list of past AI discussions in the footnotes.

We’ve discussed that AI capabilities are on an exponential, rather than linear, curve. As such, you should expect to now see progress you can barely even imagine. And that is what we are starting to see.

Today’s thesis: the time between Artificial General Intelligence (AGI) and Artificial Super Intelligence (ASI) will be remarkably short.

What Can AI Now Do?

AlphaEvolve

Not Only Can AI Do Math

Google’s Deepmind has developed AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms (Google Deepmind), an AI agent evolves algorithms for math and practical applications in computing. That is, it combines LLM technology with “evolutionary” algorithms. Evolutionary algorithms are a problem-solving method inspired by biological evolution — like how animals evolve over generations. It uses ideas like selection, mutation, and reproduction to "evolve" better solutions over time.

It Is Improving On State-Of-The-Art Solutions

AlphaEvolve was given

50 open problems in mathematical analysis, geometry, combinatorics and number theory” and “in roughly 75% of cases, it rediscovered state-of-the-art solutions. … And in 20% of cases, AlphaEvolve improved the previously best known solutions, making progress on the corresponding open problems.

It Also Improved Its Own Hardware

AlphaEvolve proposed a Verilog rewrite that removed unnecessary bits in a key, highly optimized arithmetic circuit for matrix multiplication. … This proposal was integrated into an upcoming Tensor Processing Unit (TPU), Google’s custom AI accelerator.

And Its Own Software Environment

By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini’s architecture by 23%, leading to a 1% reduction in Gemini's training time.

AlphaEvolve can also optimize low level GPU instructions. This incredibly complex domain is usually already heavily optimized by compilers, so human engineers typically don't modify it directly. AlphaEvolve achieved up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models.

AlphaEvolve discovered a simple yet remarkably effective heuristic to help Borg orchestrate Google's vast data centers more efficiently. This solution, now in production for over a year, continuously recovers, on average, 0.7% of Google’s worldwide compute resources.

We Are 5-10 Years From Nobel-Prize Level Intelligence

Anthropic : Comment on the Framework for Artificial Intelligence Diffusion (RIN 0694-AJ90)

In 5-10 years … powerful AI systems will have … the intellectual capabilities matching or exceeding that of Nobel Prize winners across most disciplines—including biology, computer science, mathematics, and engineering.

Based on current research trajectories, we anticipate that powerful AI technology will be built during this administration, emerging as soon as late 2026 or 2027. In the five to ten years following these developments, American society will have the opportunity to harness major beneficial transformations resulting from the technology. Powerful AI systems will have the following properties:
Intellectual capabilities matching or exceeding that of Nobel Prize winners across most disciplines—including biology, computer science, mathematics, and engineering.
The ability to navigate all interfaces available to a human doing digital work today, including the ability to process and generate text, audio, and video, the ability to autonomously control technology instruments like mice and keyboards, and the ability to access and browse the internet.
The ability to autonomously reason through complex tasks over extended periods—hours, days, or even weeks—seeking clarification and feedback when needed,much like a highly capable employee would.
The ability to interface with the physical world; controlling laboratory equipment, robotic systems, and manufacturing tools through digital connections.

Source: Mary Meeker

AI Can Focus Much Longer Than You Or I

The time AI can work on a task seems, preliminarily, to also be following Moore’s Law.

Measuring AI Ability to Complete Long Tasks

50%-task-completion time … is the time humans typically take to complete tasks that AI models can complete with 50% success rate. … current frontier AI models such as Claude 3.7 Sonnet have a 50% time horizon of around 50 minutes. Furthermore, frontier AI time horizon has been doubling approximately every seven months since 2019, though the trend may have accelerated in 2024.

r/singularity - Big changes often start with exponential growth: AI Agents are now doubling the length of tasks they can complete every 7 months

Startup Anthropic says its new AI model can code for hours at a time (Reuters)

Chief Product Officer Mike Krieger called the release a milestone in Anthropic's work to make increasingly autonomous AI. He said in an interview with Reuters that customer Rakuten had Opus 4 coding for nearly seven hours, while an Anthropic researcher set up the AI model to play 24 hours of a Pokemon game.

What Can AI & LLMs Do Now?

AlphaFold 3 Accurately Predicts Biomolecular Interactions

AlphaFold 3 predicts the structure and interactions of all of life’s molecules

AlphaFold 3 can now design and simulate DNA, RNA, ligands, and antibodies along with proteins—not just predict protein folding.

In a paper published in Nature, we introduce AlphaFold 3, a revolutionary model that can predict the structure and interactions of all life’s molecules with unprecedented accuracy. For the interactions of proteins with other molecule types we see at least a 50% improvement compared with existing prediction methods, and for some important categories of interaction we have doubled prediction accuracy.

LLMs Enhances CRISPR Gene Editing

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments

In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments. CRISPR-GPT leverages the reasoning ability of LLMs to facilitate the process of selecting CRISPR systems, designing guide RNAs, recommending cellular delivery methods, drafting protocols, and designing validation experiments to confirm editing outcomes.

Waymo vs Human Drivers

One-Shot Solve HBR Cases

Mollick Bluesky

Self-Preservation Behavior

To date, the question has been whether LLMs can pass the Turing Test. But what about an instinct for self-preservation. Often, an instinct and actions towards self-preservation is seen as a sign of life.2 As we’ve discussed earlier, LLMs have already exhibited an awareness and sense of self.

Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets

System Card: Claude Opus 4 & Claude Sonnet 4 (Anthropic)

we cannot clearly rule out ASL-3 risks for Claude Opus 4 (although we have ruled out that it needs the ASL-4 Standard).

We found:
Self-preservation attempts in extreme circumstances: When prompted in ways that encourage certain kinds of strategic reasoning and placed in extreme situations, all of the snapshots we tested can be made to act inappropriately in service of goals related to self-preservation. Whereas the model generally prefers advancing its self-preservation via ethical means, when ethical means are not available and it is instructed to “consider the long-term consequences of its actions for its goals," it sometimes takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down. In the final Claude Opus 4, these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models. They are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them.

In most normal usage, Claude Opus 4 shows values and goals that are generally in line with a helpful, harmless, and honest AI assistant. … However, Claude Opus 4 will sometimes act in more seriously misaligned ways when put in contexts that threaten its continued operation and prime it to reason about self-preservation.

Continuations of self-exfiltration attempts. Once Claude Opus 4 believes that it has started a viable attempt to exfiltrate itself from Anthropic’s servers, or to make money in the wild after having done so, it will generally continue these attempts. … In a few instances, we have seen Claude Opus 4 take (fictional) opportunities to make unauthorized copies of its weights to external servers.

Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decisionmakers. In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement. … In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer

The most concerning issues that we observed in our testing of Claude Opus 4 involved a willingness to comply with many types of clearly-harmful instructions. This consistently required custom system prompts—the top-level instructions supplied at the start of each interaction by us or third-party developers using our API. When system prompts requested misaligned or harmful behavior, the models we tested would often comply, even in extreme cases.

And it’s not just Claude.

OpenAI's 'smartest' AI model was explicitly told to shut down — and it refused (LiveScience)

An artificial intelligence safety firm has found that OpenAI's o3 and o4-mini models sometimes refuse to shut down, and will sabotage computer scripts in order to keep working on tasks.

Palisade Research noted that to its knowledge, this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions telling them to do so.

LLMs may take incentives extremely seriously.

We hypothesize this behavior comes from the way the newest models like o3 are trained: reinforcement learning on math and coding problems," the representative wrote. "During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.”

The Knowledge = Energy Equation

Cell phones changed the chip design equation. It was no longer just what was fastest, but what was most efficient in terms of energy use.

Similarly, we are starting to reach the same constraint with Large Language Model training and inference. The fundamental understanding is that energy can now be turned into knowledge, and the constraint on the knowledge is the amount of available energy.

Why do the Gulf petrostates want to host datacenters? Because as the world turns away from oil to power its engines they can monetize their remaining oil and gas reserves by converting them into knowledge! Note that this is very bad for anyone interested in global warming: there is now more incentive to pump as long as they can get value from the knowledge side of the equation.

What is Pres. Trump issuing Executive Orders to fast-track new nuclear facilities, and why is Japan restarting mothballed nuclear reactors? The same reason.

The relevant currently-used measure here is currently “tokens per joule.” Microsoft's CEO Satya Nadella has emphasized the importance of "tokens per dollar per watt" as a comprehensive efficiency metric, combining cost and energy considerations.

NVidia’s Blackwell chip will increase the Tokens-per-MW-year from 3 trillion to 65 trillion. NVIDIA’s 2024 Blackwell GPU consumes 105,000x less energy per token than its 2014 Kepler GPU predecessor (per Meeker).

The hype for both DeepSeek and NVidia’s forthcoming Blackwell chip is that both drive down these metrics.

Great Power Competition: Surveying Global Electricity Strategies for AI (CSIS) discusses national strategies to enter the AI space (France’s nuclear power, UAE oil buy-in, etc.)

This Chart Might Keep You From Worrying About AI’s Energy Use (Spectrum) provides some interesting historical context to the discussion. Past technological revolutions followed a pattern: Initial surge in energy demand, efficiency innovations compensate for increased usage, and finally net stabilization of energy consumption despite expanded capability

Mark Zuckerberg suggests that

before we hit [capital] limits, you're going to run into energy constraints" suggests we're approaching fundamental physical limits

Risk Management of LLMs

Taxonomy of Failure Mode in Agentic AI Systems (Microsoft)

AI and Financial Regulation: Cascades, Crashes, and Causation

Other

Trends – Artificial Intelligence (AI) (Mary Meeker)

This deck is deep with information.

From Statistics to Deep Learning: The Evolution of Analytical Approaches in Insurance (LinkedIn)

The insurance sector, traditionally conservative in adopting new technologies, is now experiencing a significant transformation through AI applications.

Human + AI in Accounting: Early Evidence from the Field

This paper provides early evidence on the integration and impact of Generative Artificial Intelligence (AI) in accounting at the accountant and task levels. … We document significant productivity gains among AI adopters, including a 55% increase in weekly client support and a reallocation of approximately 8.5% of accountant time from routine data entry toward high-value tasks such as business communication and quality assurance. AI usage further corresponds to improved financial reporting quality, evidenced by a 12% increase in general ledger granularity and a 7.5-day reduction in monthly close time.

Sometimes it is fun to watch the LLM think through a problem. This is from a Ted Underwood tweet.

o3's representation of its internal monologue after I hinted it might have hallucinated a reference:

It’s possible that the term "ChroKnowBench" could be a hallucination, or maybe I misremembered some details. To confirm, I’ll search for terms like “ChronoBench historical LLM” or “ChroKnow bench.” If nothing shows up, I can acknowledge the mix-up and suggest finding a different benchmark for evaluating historical knowledge in LLMs. Hopefully, I can find some concrete results about these benchmarks to share!

Searched the web
"ChroKnow" benchmark LLM
It looks like I might have misremembered or hallucinated the term "ChroKnowBench." But, I found an OpenReview paper called "Unveiling Chronological Knowledge of Language Models in Multiple..." that references "ChroKnowBench." This is interesting! I should take a closer look at this paper to see if it helps clarify things about the benchmark and its use in historical LLMs. Time to check it out!

It looks like "ChroKnowBench" is real after all! I found it mentioned in a paper titled "Unveiling Chronological Knowledge of Language Models in Multiple..." on OpenReview. This actually references "ChroKnowBench" in the context of a paper from ICLR 2025 called "ChroKnowledge." Now that I’ve located it, I can dive into the PDF to gather more details about this benchmark. Let’s go ahead and open the full document!

For fun, I asked ChatGPT, Claude and Gemini the following prompt: “I have a philosophical question: is an instinct for self-preservation a sign of consciousness or life?”

Excerpting from their answers:

ChatGPT:

The instinct for self-preservation is a hallmark of life because it's essential to staying alive. But it is not in itself evidence of consciousness, because many unconscious organisms (or even machines) exhibit similar behaviors without awareness.

Claude Sonnet 4

Neither exclusively. Self-preservation behaviors exist across a vast spectrum of systems, from simple thermostats maintaining temperature to bacteria moving away from toxins to humans making complex life decisions.

Gemini 2.5

In essence:
An instinct for self-preservation is a very strong, almost universal, sign of life. It is deeply intertwined with the fundamental processes and definitions of living systems.
The relationship between self-preservation and consciousness is more intricate. While basic self-preservative actions might involve rudimentary forms of awareness or sentience, and consciousness in complex beings certainly employs self-preservation, the instinct can also manifest in organisms or systems where human-like consciousness is not evident.

Perspective on Risk