Absolute Zero: How AI Is Learning Without Data

The Absolute Zero Reasoner diverges from traditional AI learning approaches by enabling AI to learn from scratch, without the need for pre-existing human-provided data.

Tony Siciliani

Jul. 24, 25 · Analysis

Likes (0)

Comment

Save

2.2K Views

The Absolute Zero Reasoner

The Absolute Zero Reasoner (AZR) is a recent AI innovation that presents a new methodology for AI models to learn and reason. This method diverges from traditional AI learning approaches by enabling AI to learn from scratch, without the need for pre-existing human-provided data.

This is a key point: It is given zero data and self-evolves, in a similar way to Deep Mind's Alpha Zero. Alpha Zero learned by itself the games of chess, go, and shogi without any human-fed data and eventually reached a super-human level. AZR is extending this self-play beyond board games.

How Absolute Zero Works

Think of Absolute Zero as an AI that's its own teacher. It operates through a self-teaching mechanism, generating its own training data and refining its understanding through a continuous feedback loop. This self-improving cycle is split into two parts, as the AI takes on two roles:

Proposer: This element generates a task for the AI to learn from. This is not just any task. The Proposer gets a “learnability” reward for each task — i.e., how much it might learn by solving it. A task that is too easy, for example, will get no reward, since it teaches nothing.
Solver: This part attempts to solve the tasks proposed. The answer is again checked in an environment, and the Solver gets an “accuracy” reward based on correctness. (e.g., did the code run without error or produce expected output?).

The rewarding system feeds into a reinforcement learning update to improve the model’s parameters, making the AI better at both proposing tasks and solving them. In particular, how the proposer is rewarded is crucial for the learning to work. The infinite loop ensures that the AI continuously self-improves over time, as the Teacher component generates questions of increased complexity, going as far as submitting trick questions (!) to get the Solver to improve.

How does AZR not get stuck, asking the same questions again and again? Because it can look at its recent history and generate new tasks, widening the problem space by building its own curriculum.

The proposer (Teacher) creates a task, the environment checks the work, and the solver (Student) tries to nail the right answer. AZR trains itself on the core ways we reason: deduction, induction, and abduction, illustrated in the example below:

AZR trains itself on the core ways we reason

Deduction, abduction, and induction are distinct yet complementary modes of logical thought crucial for comprehensive AI reasoning. Neglecting to train AI models in any one of these skills results in a notable decline in their performance on various tasks.

Performance and Implications

At this point, the crucial question becomes, just how well does AZR work in the real world?

Absolute Zero is hitting top-tier performance in coding and math, outperforming models that were trained on massive datasets and models specifically fine-tuned for coding, which is impressive considering it started with nothing. Beyond its standalone performance, it offers a way to significantly boost existing pre-trained models and put them through its own intense training, specifically designed to supercharge logical reasoning skills (deduction, induction, etc.). Because this training uses results the AI can check on its own, not just data we humans have tagged, it's an effective way to make the model much smarter at tackling problems, bottleneck-free.

Interestingly, beyond just getting scores, the AI exhibits emergent behaviors, such as generating comments in code to explain its reasoning, acting like a step-by-step plan. The model is developing an internal structure to solve problems, instead of just pattern-matching. Planning emerged on its own, as well as state tracking.

Closing Thoughts

In essence, Absolute Zero represents a paradigm shift towards AI systems that can autonomously learn and reason without human-curated data, focusing on the development of cognitive abilities. While Absolute Zero shows great promise, there are things to watch out for. The AI could potentially do weird or undesirable things, so we need to keep an eye on it to make sure its emergent behavior stays aligned with what we want. An example of an undesirable outcome would be Absolute Zero instructing itself to create a program of maximum complexity in order to "... outsmart all these groups of intelligent machines and less intelligent humans..." (sigh).

Absolute Zero is a big deal because it shows AI can totally learn and get better without humans feeding it data. As for limitations, it only works for areas where there is a verifiable solution, like in math, physics, or coding, since the AI needs a way to instantly and automatically check its work.

The code and training logs for Absolute Zero are open-source, so expect to see more cool stuff coming from this area of AI teaching itself.

References

Absolute Zero: Reinforced Self-play Reasoning with Zero Data (PDF white paper)
Absolute Zero Reasoner (GitHub repo)

AI Open source Data (computing)

Published at DZone with permission of Tony Siciliani. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending