ARC Prize has launched the hardcore ARC-AGI-2 benchmark, accompanied by the announcement of their 2025 competition with $1 million in prizes.
As AI progresses from performing narrow tasks to demonstrating general, adaptive intelligence, the ARC-AGI-2 challenges aim to uncover capability gaps and actively guide innovation.
āGood AGI benchmarks act as useful progress indicators. Better AGI benchmarks clearly discern capabilities. The best AGI benchmarks do all this and actively inspire research and guide innovation,ā the ARC Prize team states.
ARC-AGI-2 is setting out to achieve the ābestā category.
Beyond memorisation
Since its inception in 2019, ARC Prize has served as a āNorth Starā for researchers striving toward AGI by creating enduring benchmarks.Ā
Benchmarks like ARC-AGI-1 leaned into measuring fluid intelligence (i.e., the ability to adapt learning to new unseen tasks.) It represented a clear departure from datasets that reward memorisation alone.
ARC Prizeās mission is also forward-thinking, aiming to accelerate timelines for scientific breakthroughs. Its benchmarks are designed not just to measure progress but to inspire new ideas.
Researchers observed a critical shift with the debut of OpenAIās o3 in late 2024, evaluated using ARC-AGI-1. Combining deep learning-based large language models (LLMs) with reasoning synthesis engines, o3 marked a breakthrough where AI transitioned beyond rote memorisation.
Yet, despite progress, systems like o3 remain inefficient and require significant human oversight during training processes. To challenge these systems for true adaptability and efficiency, ARC Prize introduced ARC-AGI-2.
ARC-AGI-2: Closing the human-machine gap
The ARC-AGI-2 benchmark is tougher for AI yet retains its accessibility for humans. While frontier AI reasoning systems continue to score in single-digit percentages on ARC-AGI-2, humans can solve every task in under two attempts.
So, what sets ARC-AGI apart? Its design philosophy chooses tasks that are ārelatively easy for humans, yet hard, or impossible, for AI.ā
The benchmark includes datasets with varying visibility and the following characteristics:
- Symbolic interpretation: AI struggles to assign semantic significance to symbols, instead focusing on shallow comparisons like symmetry checks.
- Compositional reasoning: AI falters when it needs to apply multiple interacting rules simultaneously.
- Contextual rule application: Systems fail to apply rules differently based on complex contexts, often fixating on surface-level patterns.
Most existing benchmarks focus on superhuman capabilities, testing advanced, specialised skills at scales unattainable for most individuals.Ā
ARC-AGI flips the script and highlights what AI canāt yet do; specifically the adaptability that defines human intelligence. When the gap between tasks that are easy for humans but difficult for AI eventually reaches zero, AGI can be declared achieved.
However, achieving AGI isnāt limited to the ability to solve tasks; efficiency ā the cost and resources required to find solutions ā is emerging as a crucial defining factor.
The role of efficiency
Measuring performance by cost per task is essential to gauge intelligence as not just problem-solving capability but the ability to do so efficiently.
Real-world examples are already showing efficiency gaps between humans and frontier AI systems:
- Human panel efficiency: Passes ARC-AGI-2 tasks with 100% accuracy at $17/task.
- OpenAI o3: Early estimates suggest a 4% success rate at an eye-watering $200 per task.
These metrics underline disparities in adaptability and resource consumption between humans and AI. ARC Prize has committed to reporting on efficiency alongside scores across future leaderboards.
The focus on efficiency prevents brute-force solutions from being considered ātrue intelligence.ā
Intelligence, according to ARC Prize, encompasses finding solutions with minimal resourcesāa quality distinctly human but still elusive for AI.
ARC Prize 2025
ARC Prize 2025 launches on Kaggle this week, promising $1 million in total prizes and showcasing a live leaderboard for open-source breakthroughs. The contest aims to drive progress toward systems that can efficiently tackle ARC-AGI-2 challenges.Ā
Among the prize categories, which have increased from 2024 totals, are:
- Grand prize: $700,000 for reaching 85% success within Kaggle efficiency limits.
- Top score prize: $75,000 for the highest-scoring submission.
- Paper prize: $50,000 for transformative ideas contributing to solving ARC-AGI tasks.
- Additional prizes: $175,000, with details pending announcements during the competition.
These incentives ensure fair and meaningful progress while fostering collaboration among researchers, labs, and independent teams.
Last year, ARC Prize 2024 saw 1,500 competitor teams, resulting in 40 papers of acclaimed industry influence. This yearās increased stakes aim to nurture even greater success.
ARC Prize believes progress hinges on novel ideas rather than merely scaling existing systems. The next breakthrough in efficient general systems might not originate from current tech giants but from bold, creative researchers embracing complexity and curious experimentation.
(Image credit: ARC Prize)
See also: DeepSeek V3-0324 tops non-reasoning AI models in open-source first
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
Read the full article here