FrontierMath math test has not yet been passed by any AI like ChatGPT – UNIAN
FrontierMath became a real challenge for ChatGPT and Gemini.
It seems that we are still a long way from the technological singularity. Researchers from the Epoch AI organization presented a new mathematical benchmark, FrontierMath, which even the most advanced artificial intelligence models cannot yet cope with.
FrontierMath contains many complex mathematical expressions. Models Claude 3.5 Sonnet, GPT-4o, o1-preview and Gemini 1.5 Pro solve less than two percent of problems. At the same time, during testing, the AI has full access to the Python environment for calculations and debugging. By comparison, in older benchmarks like GSM8K or MATH, the models solve more than 90% of the equations correctly.
The main feature of FrontierMath is that the problems from it have not been published anywhere before, that is, neural networks could not be trained in advance to solve such expressions.