Here’s how OpenAI will determine how powerful its AI systems are
OpenAI has created an internal scale to track the progress its large language models are making toward artificial general intelligence, or AI with human-like intelligence, a spokesperson told Bloomberg.
Today’s chatbots, like ChatGPT, are at Level 1. OpenAI claims it is nearing Level 2, defined as a system that can solve basic problems at the level of a person with a PhD. Level 3 refers to AI agents capable of taking actions on a user’s behalf. Level 4 involves AI that can create new innovations. Level 5, the final step to achieving AGI, is AI that can perform the work of entire organizations of people. OpenAI has previously defined AGI as “a highly autonomous system surpassing humans in most economically valuable tasks.”
OpenAI’s unique structure is centered around its mission of achieving AGI, and how OpenAI defines AGI is important. The company has said that “if a value-aligned, safety-conscious project comes close to building AGI” before OpenAI does, it commits to not competing with the project and dropping everything to assist. The phrasing of this in OpenAI’s charter is vague, leaving room for the judgment of the for-profit entity (governed by the nonprofit), but a scale that OpenAI can test itself and competitors on could help dictate when AGI is reached in clearer terms.
Still, AGI is still quite a ways away: it will take billions upon billions of dollars worth of computing power to reach AGI, if at all. Timelines from experts, and even at OpenAI, vary wildly. In October 2023, OpenAI CEO Sam Altman said we are “five years, give or take,” before reaching AGI.
This new grading scale, though still under development, was introduced a day after OpenAI announced its collaboration with Los Alamos National Laboratory, which aims to explore how advanced AI models like GPT-4o can safely assist in bioscientific research. A program manager at Los Alamos, responsible for the national security biology portfolio and instrumental in securing the OpenAI partnership, told The Verge that the goal is to test GPT-4o’s capabilities and establish a set of safety and other factors for the US government. Eventually, public or private models can be tested against these factors to evaluate their own models.
In May, OpenAI dissolved its safety team after the group’s leader, OpenAI cofounder Ilya Sutskever, left the company. Jan Leike, a key OpenAI researcher, resigned shortly after claiming in a post that “safety culture and processes have taken a backseat to shiny products” at the company. While OpenAI denied that was the case, some are concerned about what this means if the company does in fact reach AGI.
OpenAI hasn’t provided details on how it assigns models to these internal levels (and declined The Verge’s request for comment). However, company leaders demonstrated a research project using the GPT-4 AI model during an all-hands meeting on Thursday and believe this project showcases some new skills that exhibit human-like reasoning, according to Bloomberg.
This scale could help provide a strict definition of progress, rather than leaving it up for interpretation. For instance, OpenAI CTO Mira Murati said in an interview in June that the models in its labs are not much better than what the public has already. Meanwhile, CEO Sam Altman said late last year that the company recently “pushed the veil of ignorance back,” meaning the models are remarkably more intelligent.