The latest AI challenge: the Mathematics Olympiads

For four years, computer scientist Trieu Trinh has been consumed by a metamathematics problem of sorts: how to build an ai model that solves geometry problems from the International Mathematics Olympiad, the annual competition for the most mathematically attuned high school students in the world. world. .

Last week, Dr. Trinh successfully defended his doctoral thesis on this topic at New York University; This week he described the result of his work in the journal Nature. Called AlphaGeometrythe system solves Olympic geometry problems almost at the level of a human gold medalist.

While developing the project, Dr. Trinh introduced it to two Google research scientists, and they hired him as a resident from 2021 to 2023. AlphaGeometry joins Google's fleet of DeepMind ai systems, which have become well-known for tackling big challenges. Perhaps most famous, potter, a deep learning algorithm, conquered chess in 2017. Mathematics is a more difficult problem, since the number of possible paths to a solution is sometimes infinite; Chess is always finite.

“I kept running into dead ends, going down the wrong path,” said Dr. Trinh, lead author and driving force behind the project.

The paper's co-authors are Dr. Trinh's doctoral advisor, He He, at New York University; Yuhuai Wu, known as Tony, co-founder of xAI (previously at Google) who in 2019 had independently begun exploring a similar idea; Thang Luong, the lead researcher, and Quoc Le, both from Google DeepMind.

Dr. Trinh's perseverance paid off. “We're not making incremental improvements,” he said. “We are taking a big leap, a big step forward in terms of results.”

“Just don't overdo it,” he said.

the big jump

Dr. Trinh introduced the AlphaGeometry system with a test set of 30 Olympic geometry problems made between 2000 and 2022. The system solved 25; Historically, during that same period, the average human gold medalist worked out 25.9. Dr. Trinh also attributed the problems to a system developed in the 1970s that was known to be the strongest geometry theorem prover; solved 10.

In recent years, Google DeepMind has carried out a series of projects that investigate the application of ai to mathematics. And more generally, in this area of research, the mathematical problems of the Olympics have been adopted as a reference point; OpenAI and Meta ai have achieved some results. For further motivation, there is the IMO Grand Challengeand a new challenge announced in November, the artificial intelligence Mathematics Olympiad Awardwith a $5 million pot destined for the first ai to win gold at the Olympic Games.

The AlphaGeometry article begins with the argument that proving the Olympiad theorems “represents a notable milestone in human-level automated reasoning.” Michael Barany, a historian of mathematics and science at the University of Edinburgh, said he wondered whether this was a significant mathematical milestone. “What IMO is testing is very different from what creative mathematics looks like to the vast majority of mathematicians,” he said.

Terence Tao, a mathematician at the University of California, Los Angeles (and the youngest gold medalist in Olympic history, when he was 12) said he thought AlphaGeometry was “good work” and had achieved “surprising results.” solid.” Perfecting an ai system to solve Olympic problems may not improve his deep research skills, he said, but in this case the journey may prove more valuable than the destination.

As Dr. Trinh sees it, mathematical reasoning is just one type of reasoning, but it has the advantage of being easily verifiable. “Mathematics is the language of truth,” he said. “If you want to build an ai, it's important to build a trustworthy, truth-seeking ai that you can trust,” especially for “safety-critical applications.”

Proof of concept

AlphaGeometry is a “neuro-symbolic” system. It combines a neural network language model (good at artificial intuition, like ChatGPT but smaller) with a symbolic engine (good at artificial reasoning, like a kind of logic calculator).

And it's tailor-made for geometry. “Euclidean geometry is a good testbed for automatic reasoning because it is a self-contained domain with fixed rules,” said Heather Macbeth, a geometer at Fordham University and an expert in computer-verified reasoning. (When she was a teenager, Dr. Macbeth won two IMO medals.) AlphaGeometry “seems to be good progress,” she said.

The system has two especially novel features. First, the neural network is trained solely on algorithmically generated data (a whopping 100 million geometric tests) without using human examples. Using synthetic data built from scratch overcame an obstacle in automated theorem proving: the scarcity of human-proof training data translated into machine-readable language. “To be honest, at first I had some doubts about how this would be successful,” Dr. He said.

Second, once AlphaGeometry was freed from a problem, the symbolic engine began to solve it; If it got stuck, the neural network suggested ways to increase the test argument. The cycle continued until a solution materialized or until time ran out (four and a half hours). In mathematical jargon, this augmentation process is called “auxiliary construction.” Add a line, divide an angle, draw a circle: this is how mathematicians, student or elite, tinker and try to understand a problem. In this system, the neural network learned to perform auxiliary and human-shaped constructions. Dr. Trinh compared it to wrapping a rubber band around the lid of a stubborn jar to help your hand get a better grip.

“It's a very interesting proof of concept,” said Christian Szegedy, a co-founder of xAI who previously worked at Google. But “it leaves a lot of questions open,” he said, and is not “easily generalizable to other domains and other areas of mathematics.”

Dr Trinh said he would try to generalize the system across all mathematical fields and beyond. He said that he wanted to take a step back and consider “the common underlying principle” of all kinds of reasoning.

Stanislas Dehaene, a cognitive neuroscientist at the Collège de France who has a research interest with fundamental geometric knowledge, said he was impressed with AlphaGeometry's performance. But he observed that he “doesn't 'see' anything about the problems he solves”; rather, it only captures logical and numerical encodings of images. (The drawings in the article are for the benefit of the human reader.) “There is absolutely no spatial awareness of the circles, lines and triangles that the system learns to manipulate,” Dr. Dehaene said. The researchers agreed that a visual component could be valuable; Dr. Luong said it could be added, perhaps within a year, using Google's Gemini, a “multimodal” system that ingests both text and images.

Moving solutions

In early December, Dr. Luong visited his former high school in Ho Chi Minh City, Vietnam, and showed AlphaGeometry to his former teacher and IMO coach, Le Ba Khanh Trinh. Dr. Lê was the top gold medalist at the 1979 Olympiad and won a special prize for his elegant geometric solution. Dr. Lê analyzed one of AlphaGeometry's tests and found it notable but unsatisfactory, Dr. Luong recalled: “he found it mechanical and said it lacks the soul, the beauty of the solution he seeks.”

Dr. Trinh had previously asked Evan Chen, a mathematics PhD student at MIT (and IMO coach and Olympic gold medalist) to verify some of AlphaGeometry's work. He was correct, Mr. Chen said, adding that he was intrigued by how the system had found the solutions.

“I'd like to know how the machine works with this,” he said. “But, I mean, I'd also like to know how humans find solutions.”