Abstract Visual Reasoning Enabled by Language. (arXiv:2303.04091v3 [cs.AI] UPDATED)
By: <a href="http://arxiv.org/find/cs/1/au:+Camposampiero_G/0/1/0/all/0/1">Giacomo Camposampiero</a>, <a href="http://arxiv.org/find/cs/1/au:+Houmard_L/0/1/0/all/0/1">Loic Houmard</a>, <a href="http://arxiv.org/find/cs/1/au:+Estermann_B/0/1/0/all/0/1">Benjamin Estermann</a>, <a href="http://arxiv.org/find/cs/1/au:+Mathys_J/0/1/0/all/0/1">Joël Mathys</a>, <a href="http://arxiv.org/find/cs/1/au:+Wattenhofer_R/0/1/0/all/0/1">Roger Wattenhofer</a> Posted: June 23, 2023
While artificial intelligence (AI) models have achieved human or even
superhuman performance in many well-defined applications, they still struggle
to show signs of broad and flexible intelligence. The Abstraction and Reasoning
Corpus (ARC), a visual intelligence benchmark introduced by Franc{c}ois
Chollet, aims to assess how close AI systems are to human-like cognitive
abilities. Most current approaches rely on carefully handcrafted
domain-specific program searches to brute-force solutions for the tasks present
in ARC. In this work, we propose a general learning-based framework for solving
ARC. It is centered on transforming tasks from the vision to the language
domain. This composition of language and vision allows for pre-trained models
to be leveraged at each stage, enabling a shift from handcrafted priors towards
the learned priors of the models. While not yet beating state-of-the-art models
on ARC, we demonstrate the potential of our approach, for instance, by solving
some ARC tasks that have not been solved previously.
Provided by:
http://arxiv.org/icons/sfx.gif