Predictions on AI (2026–2060)
Being concrete about the questions that matter.
“AI is going to change everything.” It’s a phrase you might hear in boardrooms, schools, on Wall Street, and at dinner tables across the country. And of course it’s all you hear in Silicon Valley. The refrain is unfortunately not very useful, since the conversation about AI’s future is operating on two entirely different scales of imagination.
When most people hear that “AI is going to change everything,” they envision a 2030s world of automated customer service, AI-assisted coding, and perhaps the need for some workers to retrain to more relevant roles. Their mental picture of “AGI” feels like ChatGPT++. Perhaps AI assistants will schedule all our meetings. Maybe NVIDIA’s stock price will rise.
When we use the term “AI is going to change everything,” we imagine something more exotic. In a handful of years, we believe AI will obviously, unceremoniously surpass almost all human intellectual labor, even on messy, long-horizon work. From there, the real question turns to when superhuman science and engineering will give civilization the ability to radically reshape our physical environment, though humans may not be around long to benefit.
We believe tremendously powerful AI systems may arrive soon.1 The point of this post is partly to quantify what we mean by “soon,” in the form of specific probabilistic predictions that can form a clear basis for disagreement. But almost more importantly, we’ve selected a sample of questions that, in their plausibility, may help some readers begin to picture what it might look like for AI to really change everything.2
Enormous credit to the Manifold community for many of the questions. We have linked the relevant questions, though note that some have been modified for relevance or clarity.3
Predictions
Near future: Little slowdown, no “wall”, and the continuation of rapid AI progress.
By the end of August 2026, will a frontier AI system achieve over 90% on SWE-Bench Verified?
We are willing to place 85% probability.
By the end of November 2026, will AIs meaningfully outperform most trained humans (working without AI assistance) at forecasting geopolitical events?
Outperforming human expert forecasters (i.e. superforecasters) working without AI assistance would count as YES. Achieving a lower Brier score than well-regarded forecasting platforms (e.g. Metaculus, Polymarket) on a representative set of questions would also count as YES.
We are willing to place 60% probability.
In other words, we believe AI Snake Oil is on track to be wrong in their prediction that this will not occur for at least a decade.
By the end of 2026, will an AI be able to highly reliably determine whether a touch-move violation has happened in a chess game (given video input)?4
To meet the bar for “high reliability,” red teamers (and/or the public) must have been given ample opportunity to search for videos of real chess games which the system misevaluates. Though the system does not need to be perfect, a YES resolution requires evaluators (or broad public consensus) to agree that the system decides whether a touch move violation has occurred with comparable reliability to a careful human who is knowledgeable about the rules. The system should very rarely miss blatant violations (<1%). In cases that humans consider ambiguous, the system should generally (in >90% of cases) provide reasonable justifications for its decision.
We are willing to place 40% probability.
By the end of March 2027, per METR’s evaluation and/or other publicly available evaluations, will AI agents be able to do software engineering tasks that would take humans 8+ hours in length, at a 50% success rate or higher?5
We are willing to place 90% probability.
By the end of 2027, will AI systems be able to create AA-quality video games autonomously?6
Example: An AI receiving the prompt “Create a complete, high-quality, rogue-like deckbuilding video game set in a dystopian future where humans and robots are at war” should be able to handle all aspects of game creation from this prompt alone, resulting in a fully functional, high-quality video game. Refers to a complete and functional video game created by an AI without human intervention in the creative process. This includes game design, mechanics, graphics, sound, and coding.
The game should be comparable in quality and complexity to a mid to high-budget video game produced by humans in the same period. It should have a coherent plot, well-designed game mechanics, professional-level graphics and sound, and be largely free of bugs and glitches. The game should be playable from start to finish, offering a complete gaming experience.
We are willing to place 75% probability.
By the end of 2027, will AI systems be able to write cogent, persuasive legal briefs without hallucinating any cases?
We are willing to place 85% probability.
By the end of 2027, we will develop Leopold Aschenbrenner’s vision “of drop-in remote AI workers”? It need not be available to the public, if it is being used only internally.
Here are two paragraphs from the essay which include descriptions of what Leopold imagines: “An agent that joins your company, is onboarded like a new human hire, messages you and colleagues on Slack and uses your softwares, makes pull requests, and that, given big projects, can do the model-equivalent of a human going away for weeks to independently complete the project. Again, critically, don’t just imagine an incredibly smart ChatGPT: unhobbling gains should mean that this looks more like a drop-in remote worker, an incredibly smart agent that can reason and plan and error-correct and knows everything about you and your company and can work on a problem independently for weeks.”
We are willing to place 65% probability.
In early 2028, will an AI be able to generate a full high-quality movie script to a prompt? i.e. “make me a 120 minute Star Trek / Star Wars crossover”. It should be more or less comparable to a big-budget studio film, although it doesn’t have to pass a full Turing Test as long as it’s pretty good. The AI doesn’t have to be available to the public, as long as it’s confirmed to exist.
We are willing to place 70% probability.
(Note: Think, for instance, about what this means for the value of Disney’s IP. There are certainly implications, but we can’t tell the direction.)
Will AI or automation be among the top 5 most important issues for voters in the lead up to the 2028 election? (Currently the most important issues are economy, immigration, crime, etc.; AI is around ~30th.)
We are willing to place 45% probability.
By the end of 2028, will AI be at least as big a political issue as abortion?
We are willing to place 85% probability.
By the end of 2028, will a general frontier language model beat a super grandmaster playing chess?
We are willing to place 80% probability.
By the end of 2028, will an AI system be able to perform a security-focused C-to-Rust rewrite of a major open-source codebase?
To resolve YES, an AI system must take a large, well-established open-source C project with >500,000 lines of code and completely rewrite it in Rust. The open-source community must mostly agree after several months that the functionality of the library is substantially maintained. The rewritten codebase must pass the project’s complete, pre-existing automated test suite.
Such a rewrite will prevent memory safety vulnerabilities, which are the major root cause of most C-based software bugs and cybersecurity flaws.
We are willing to place 80% probability.
Will AI hallucinations be a ~fixed problem by the end of 2028?7
In other words, it should be widely acknowledged that AI systems connected to the internet almost never make fact-based AI hallucinations.
A reliable use of citations over Internet-scale corpora will count as a YES resolution.
Training models to look for and cite sources sounds like a kosher way to fix the “hallucination problem” of model unreliability in factual domains, and this is how they will be used in practice. Resolving based on hallucination rates of vanilla LLMs with no access to outside information is not appropriate if it is not the dominant regime for deploying models.
Another YES resolution would be if a frontier lab demonstrates that their latest model hallucinates roughly 0.01x as often as GPT-4 on internal benchmarks, or using good public benchmarks such as SimpleQA.
If most everyday users cannot elicit hallucinations, but adversarially crafted strings created by experts still can (analogous to jailbreaks), it can still resolve YES.
If something is widely acknowledged in the public discourse to be credible, and then the AI cites it, and it is later proven incorrect (e.g. NYT retraction), this does not count as a hallucination.
We are willing to place 85% probability.
By the end of 2028, will there be a successful business involving at least 10 human employees, with revenues above $10M/y, in an industry that involves maintaining long-term relations with business partners as opposed to distributed short-term transactions with consumers (e.g. social media platforms, app development), where AI systems are primarily responsible for the company’s operations and people/project management?
To resolve YES, we could have an owner credibly attest that the AI systems are essentially running the entire company, including most day-to-day business decisions and directions. AI systems should at least play a role substantially equivalent to a CEO.
We are willing to place 65% probability.
Will AI systems outperform most human mathematics researchers in all mathematical research areas by the end of 2028?
To resolve YES, publicly available AI tools must have absolute advantage over most human mathematicians. In other words, they should be able to create high-quality original research with no (or very little) human input that is on-par or better than most professional mathematicians, in all areas of mathematics (judged by Daniel Litt, a mathematics professor at the University of Toronto).
If humans are still required for some part of the math process (e.g. creating definitions, making conjectures), this question will resolve NO.
If there exist any niche math research areas where research cannot be elicited from a publicly available AI — e.g. niche domains of math that require continual learning or good multimodal perception, where AIs cannot perform well — this question will resolve NO.
We are willing to put 40% probability.
By the end of 2028, will there be any major breakthrough in AI continual learning?8
Continual learning should be able to be done over weeks, and the model should be widely agreed to effectively learn from mistakes and written feedback during agentic tasks and generally not repeat those mistakes, like humans.
We’ll operationalize as an AI system which receives at least 8/10 points on the Long-Term Memory Storage (MS) component of Hendrycks’ AGI Definition.
We are willing to put 80% probability.
Will there be a unicorn ($1b+ valuation) founded and operated by just one person by the end of 2029?9
We do not include companies whose primary activity is trading; they must provide goods and services.
We are willing to place 70% probability.
Will AI be capable of producing an Annals-quality Number Theory paper by the end of 2029?10
An AI must produce a Number Theory paper at the quality level of papers published in Annals today (i.e. 2025), as judged by mathematician Daniel Litt. The AI would receive no detailed guidance relevant to the mathematics research, and is required to accomplish this task autonomously.
The AIs are granted a total budget of $100k in inference compute per paper.
We are willing to place 85% probability.
By the end of 2029, will an AI be able to generate a full high-quality movie? i.e. “make me a 120 minute Star Trek / Star Wars crossover.” It should be more or less comparable to a big-budget studio film, although it doesn’t have to pass a full Turing Test as long as it’s pretty good. The AI doesn’t have to be available to the public, as long as it’s confirmed to exist.11
Year modified from original Manifold question.
We are willing to place 70% probability.
By the end of 2029, will AI systems be able to create AAA-quality video games ~autonomously?
The equivalent of hundreds to thousands of 2020-era person-years of software development.
“~Autonomous” development can mean extremely limited, mostly non-technical human involvement, such as feedback or broad suggestions.
We are willing to place 70% probability.
By the end of 2030, will an AI be capable of doing almost all (>99%) of the CAD labor done in the year 2025?
Such an evaluation could involve giving models a detailed specification and evaluating the quality of their output against what human professionals might have produced in the year 2025.
This would be judged by the broad sentiments of people in the mechanical engineering and manufacturing industry, as well as by consulting relevant CAD benchmarks.
We are willing to place 85% probability.
By the end of 2033, will a team of autonomous robotic systems build a habitable house from foundation to finish?
A crew composed entirely of robotic hardware must construct a single-family dwelling of at least 1,200 square feet. Their assembly on-site must be fully automated.
The AIs must manage the entire workflow, from interpreting the architectural plans to coordinating robotic tasks like site preparation, foundation pouring, framing, roofing, and utility installation, without any intervention from humans.
The project is considered complete only when the structure technically satisfies professional standards akin to the 2025 health and safety requirements of the state of California for habitation (e.g. standards from this document). This does not necessitate that the structure be built in California in particular. This is also separate from the question of whether such a structure would receive municipal certification in practice, for instance if human involvement is required.
We are willing to place 75% probability.
The coming decades. We think that AI systems will be capable of doing over the next few decades things that science fiction authors believed would take centuries or millennia of standard scientific progress; this is because we expect to see qualitatively new possibilities when science and engineering are automated and no longer dependent on human labor.
By the end of 2035, will more than one billion general-purpose robots have been manufactured?
Qualifying hardware must be capable of:
Adaptive manipulation, e.g. using one or more “limbs” or manipulators to interact with and handle a variety of objects, including those not seen during initial training.
Moving through and adapting to a novel or unstructured real-world environments (not just pre-mapped, static ones or simple tracked paths). Thereby, the hardware must also be capable of sensing and perceiving their environment using sensors beyond simple triggers (e.g., computer vision, LiDAR, tactile feedback), and the software must be capable of advanced decisions and adapting its actions in real-time without human control.
Humanoid robots and robo-dogs that can be deployed to new environments and be given new physical tasks would count. Autonomous cars, standard drones and quadcopters, 2010s-era consumer robot vacuums, and simple pre-programmed industrial arms on an assembly lines would not count.
Engineered nanomaterial that can achieve physical ends comparable to 1 billion humanoid robots can also resolve YES.
Though manufactured hardware may have poor accompanying software upon release, future software updates may cause hardware to qualify under this definition. A piece of hardware can contribute to the final count only once accompanying software can be demonstrated to allow it to adapt to novel and unstructured real-world environments, and interact with a variety of objects.
We are willing to place 60% probability.
By the end of 2035, will there be a $100 billion company fully operated by AIs?
All economically valuable labor, including management and compliance, is de facto done by AIs. The $100 billion valuation must be in 2025 dollars (inflation-adjusted).
Insofar as people are employed by the firm, it is only because the firm requires them for legal/regulatory/figurehead reasons. Without these reasons, the company should be able to fire all humans with very little (or even positive) impact to its valuation and long-term business prospects.
We are willing to place 75% probability.
Will there be a four year period (ending before the end of 2040) when world real GDP doubles?
We are willing to place 75% probability.
By the end of 2040, will Earth-based technological civilization be using over two orders of magnitude more energy than it currently uses?
The continued existence, death, or subjugation of the human species is irrelevant to the resolution. If humanity is extinct but an Earth-derived AI civilization is operating machinery that consumes this amount of energy, the question resolves YES.
We are willing to place 40% probability.
Will there be meaningful two-way conversation between a dolphin/whale and a human before 2040?
Some examples that would resolve YES:
We build a system that can communicate the steps of a complicated trick or maneuver using an audio system alone to a dolphin. The dolphin attempts the trick, and we provide solely audio feedback about what it might have done wrong. In less than 3 rounds of feedback, the dolphin does the trick correctly.
We negotiate a trade between ourselves and a dolphin, asking what we must offer in order for it to complete an unfamiliar and nontrivial task. The dolphin responds vocally with a request that we can confidently interpret, then performs the task after the request is met.
On an I-know-it-when-I-see-it basis, a conversation more conceptually broad and substantive than one can have with a typical two-year-old human
We are willing to place 75% probability, conditioned on the continued existence of both humans and dolphins/whales.
Will a therapy that substantially slows down or reverses the aging process in humans be developed before 2040?
This question will resolve to “YES” if, before January 1st, 2040, a therapy is publicly and credibly documented to have:
Demonstrated a significant reduction in biological age, defined as a minimum of a 10-year reduction in the average biological age of a group of at least 100 human participants. Biological age must be assessed using a composite measure of validated biomarkers that are known to be strongly associated with aging, such as telomere length, DNA methylation, and cellular senescence markers.
Provided substantial evidence of slowing down or reversing the aging process, as measured by a minimum of a 25% improvement in at least two age-related clinical endpoints or physiological functions, compared to baseline measurements. Examples of such endpoints may include cognitive function, cardiovascular health, and immune function.
Demonstrated a favorable safety profile, with no severe adverse events or unacceptable toxicity observed during the course of the study.
We are willing to place 75% probability, conditioned on the continued existence of humanity.
Will at least 4 of the 7 Millennium Problems be solved by 2040?
We are willing to place 70% probability.
Will there be a child polymath of unprecedented talent before 2050? E.g. an exceptional human junior who will significantly surpass the best modern-day humans/experts in world-class academic breadth (e.g. due to AI education, embryo selection, or gene editing)12
To operationalize: one publicly-profiled child under the age of 18 must be able to be able to accomplish 100+ on the Putnam, gold medals on IMO/IOI/IPhO and at least one of IChO or IBO, 2500 Chess FIDE (Grandmaster) rating, simultaneously highly fluent in at least 5 languages (Superior in Speaking, Writing, Listening, and Reading in ACTFL Proficiency Guidelines), Quarters or better in Lincoln-Douglas at the TOC, AND the equivalent of acceptance to Juilliard’s Music Division (any major) on the strength of their musical abilities, either by demonstration or with at least 90% probability in the judgment of highly-reliable epistemic systems, in test settings adjusted to be close as possible to their late-2010s equivalents. The tests do not have to be all in a row (see Manifold operationalization).
We are willing to place 80% probability, conditioned on the continued existence of humanity.
Will Earth-based technological civilization have the technical capability to change the Earth’s atmosphere to have pre-industrial levels of carbon dioxide, for less than 1% of the world’s GDP for a single year, by 2050?
In other words, climate change will essentially be considered a de facto solved technological problem by 2050.
We are willing to place 80% probability.
Will AI systems be able to build 30-day deterministic weather models by 2050?13
We are willing to place 70% probability.
Will Earth-derived technological civilization have the capability to build a Dyson swarm by 2055, if they started working toward it in 2045?
In particular, consider a hypothetical scenario where a broad political mandate is reached to build a Dyson swarm beginning in the year 2045. This question resolves YES if credible epistemic systems place above a 95% chance that a swarm capturing at least 1% of the Sun’s emitted energy could be put in place before 2055.
We might reach this degree of confidence if, for example, there is a well-known, clear, de-risked technical pathway to completing such a project given broad civilizational coordination. In the same way that we had good reason to believe nuclear weapons or the Large Hadron Collider could be built before they actually were, we may have strong evidence that we have the technical capacity to build a Dyson swarm (even if we choose not to).
We are willing to place 75% probability.
By 2060, will it be consensus among historians that humanity faced a greater than 33% probability of catastrophe (outcomes where >95% of humans who were alive in 2025 died) from 2025-2055?
To resolve YES, the prevailing consensus as expressed in leading epistemic institutions (equivalents in 2025 would include peer-reviewed historical journals, major academic surveys, publications by prominent historical associations) should be that, conditioned on the state of the world in 2025, the probability of a catastrophe occurring during the 30-year period from 2025 to 2055 exceeded 33%.14
If AI historians hold absolute advantage over human historians and become the accepted standard for historical analysis, then a consensus among AI historians will suffice.
We believe most of this probability comes from AI-derived risks (e.g. malicious use, lethal autonomous weapons, rogue AI risks, intelligence explosion). Examples of such risks include a misaligned superintelligence pursuing goals contrary to human survival or the release of an AI-engineered pathogen designed for maximum lethality and transmissibility.
We are willing to place 80% probability.
If you think this is radically incorrect, make your dissenting predictions legibly so they can be compared later.
Thank you to Jason Hausenloy, Connor Smith, Ian Pedroza, Tomek Korbak, and Jesse Richardson for feedback.
There are of course forecasters with even more aggressive projections than ours, but our views fall on the shorter side among the forecasting community. It is worth mentioning that even well-informed AI forecasters with “long” timelines tend to expect AI to cause change at a scale that dwarfs many mainstream intuitions.
The organization we work for, CAIS is largely an AI benchmarking research organization. But in actuality, we are an operationalization research institution. Benchmarks operationalize desirable properties of models, whether they are capabilities (HLE, MATH, MMLU) or propensities (MASK, MACHIAVELLI). It’s no wonder that we (the authors) are so interested in operationalizing various future forecasts of AI as well.
Anti-AI headwinds or political regulation may impede the adoption of some types of capabilities, and perhaps these forces or global conflict could block AI development more broadly. Our probabilities are generally “all-things-considered,” meaning we have attempted to correct for the possibility that unusual circumstances might upend the default trajectory of AI capabilities progress. It goes without saying that the set of questions we consider are quite correlated.
Some of the questions are explicitly about the future of humanity, in which case we conditioned on the continued existence of humanity. The rest of the questions are implicitly about the state of Earth-originating intelligence, in which case our probabilities are “all-things-considered”.
Year modified from original Manifold question.
Year modified from original Manifold question.
Year modified from original Manifold question(s).
We added more clarification compared to the original Manifold question.
We added more clarification compared to the original Manifold question.
Year modified from original Manifold question.
Year modified from original Manifold question.
Year modified from original Manifold question.
Year modified from original Manifold question.
This high probability is largely based on the expectation that by 2050, we will possess the technical capability to actively steer and control Earth’s weather patterns.
We put >1% if we assume we do not possess the capability to effectively control Earth’s weather.
Actually, now that you mention it, Grok and I are working on unifying gravity and quantum physics. As soon as I’m done with that, I bet they can help me predict the weather deterministically. (Credit to Connor Smith for this joke.)
A note on the art of creating forecasting questions: Conditioning on future knowledge is a great way to “resolve” questions that do not have a definite resolution date.




Hats off on trying to operationalize your beliefs. I find many of the questions to still be too vague or open to interpretation/disagreement but I was asked if I would bet on one of them and therefore, I will choose the Ashebrenner style drop in remote worker by the end of 2027. I will bet NO at 35% and you will bet YES at 65%. I am happy to bet up to $15000 on this. Feel free to message me.