California releases long-awaited AI safety report


Last September, all eyes were on Senate Bill 1047 as it made its way to California Governor Gavin Newsom’s desk — and died there as he vetoed the buzzy piece of legislation.

SB 1047 would have required makers of all large AI models, particularly those that cost $100 million or more to train, to test them for specific dangers. AI industry whistleblowers weren’t happy about the veto, and most large tech companies were. But the story didn’t end there. Newsom, who had felt the legislation was too stringent and one-size-fits-all, tasked a group of leading AI researchers to help propose an alternative plan — one that would support the development and the governance of generative AI in California, along with guardrails for its risks.

On Tuesday, that report was published.

The authors of the 52-page “California Report on Frontier Policy” said that AI capabilities — including models’ chain-of-thought “reasoning” abilities — have “rapidly improved” since Newsom’s decision to veto SB 1047. Using historical case studies, empirical research, modeling, and simulations, they suggested a new framework that would require more transparency and independent scrutiny of AI models. Their report is appearing against the backdrop of a possible 10-year moratorium on states regulating AI, backed by a Republican Congress and companies like OpenAI.

The report — co-led by Fei-Fei Li, Co-Director of the Stanford Institute for Human-Centered Artificial Intelligence; Mariano-Florentino Cuéllar, President of the Carnegie Endowment for International Peace; and Jennifer Tour Chayes, Dean of the UC Berkeley College of Computing, Data Science, and Society — concluded that frontier AI breakthroughs in California could heavily impact agriculture, biotechnology, clean tech, education, finance, medicine and transportation. Its authors agreed it’s important to not stifle innovation and “ensure regulatory burdens are such that organizations have the resources to comply.”

“Without proper safeguards… powerful Al could induce severe and, in some cases, potentially irreversible harms”

But reducing risks is still paramount, they wrote: “Without proper safeguards… powerful Al could induce severe and, in some cases, potentially irreversible harms.”

The group published a draft version of their report in March for public comment. But even since then, they wrote in the final version, evidence that these models contribute to “chemical, biological, radiological, and nuclear (CBRN) weapons risks… has grown.” Leading companies, they added, have self-reported concerning spikes in their models’ capabilities in those areas.

The authors have made several changes to the draft report. They now note that California’s new AI policy will need to navigate quickly-changing “geopolitical realities.” They added more context about the risks that large AI models pose, and they took a harder line on categorizing companies for regulation, saying a focus purely on how much compute their training required was not the best approach.

AI’s training needs are changing all the time, the authors wrote, and a compute-based definition ignores how these models are adopted in real-world use cases. It can be used as an “initial filter to cheaply screen for entities that may warrant greater scrutiny,” but factors like initial risk evaluations and downstream impact assessment are key.

That’s especially important because the AI industry is still the Wild West when it comes to transparency, with little agreement on best practices and “systemic opacity in key areas” like how data is acquired, safety and security processes, pre-release testing, and potential downstream impact, the authors wrote.

The report calls for whistleblower protections, third-party evaluations with safe harbor for researchers conducting those evaluations, and sharing information directly with the public, to enable transparency that goes beyond what current leading AI companies choose to disclose.

One of the report’s lead writers, Scott Singer, told The Verge that AI policy conversations have “completely shifted on the federal level” since the draft report. He argued that California, however, could help lead a “harmonization effort” among states for “commonsense policies that many people across the country support.” That’s a contrast to the jumbled patchwork that AI moratorium supporters claim state laws will create.

In an op-ed earlier this month, Anthropic CEO Dario Amodei called for a federal transparency standard, requiring leading AI companies “to publicly disclose on their company websites … how they plan to test for and mitigate national security and other catastrophic risks.”

“Developers alone are simply inadequate at fully understanding the technology and, especially, its risks and harms”

But even steps like that aren’t enough, the authors of Tuesday’s report wrote, because “for a nascent and complex technology being developed and adopted at a remarkably swift pace, developers alone are simply inadequate at fully understanding the technology and, especially, its risks and harms.”

That’s why one of the key tenets of Tuesday’s report is the need for third-party risk assessment.

The authors concluded that risk assessments would incentivize companies like OpenAI, Anthropic, Google, Microsoft and others to amp up model safety, while helping paint a clearer picture of their models’ risks. Currently, leading AI companies typically do their own evaluations or hire second-party contractors to do so. But third-party evaluation is vital, the authors say.

Not only are “thousands of individuals… willing to engage in risk evaluation, dwarfing the scale of internal or contracted teams,” but also, groups of third-party evaluators have “unmatched diversity, especially when developers primarily reflect certain demographics and geographies that are often very different from those most adversely impacted by AI.”

But if you’re allowing third-party evaluators to test the risks and blind spots of your powerful AI models, you have to give them access — for meaningful assessments, a lot of access. And that’s something companies are hesitant to do.

It’s not even easy for second-party evaluators to get that level of access. Metr, a company OpenAI partners with for safety tests of its own models, wrote in a blog post that the firm wasn’t given as much time to test OpenAI’s o3 model as it had been with past models, and that OpenAI didn’t give it enough access to data or the models’ internal reasoning. Those limitations, Metr wrote, “prevent us from making robust capability assessments.” OpenAI later said it was exploring ways to share more data with firms like Metr.

Even an API or disclosures of a model’s weights may not let third-party evaluators effectively test for risks, the report noted, and companies could use “suppressive” terms of service to ban or threaten legal action against independent researchers that uncover safety flaws.

Last March, more than 350 AI industry researchers and others signed an open letter calling for a “safe harbor” for independent AI safety testing, similar to existing protections for third-party cybersecurity testers in other fields. Tuesday’s report cites that letter and calls for big changes, as well as reporting options for people harmed by AI systems.

“Even perfectly designed safety policies cannot prevent 100% of substantial, adverse outcomes,” the authors wrote. “As foundation models are widely adopted, understanding harms that arise in practice is increasingly important.”



Source link