AI Agents Are Getting Better at Writing Code—and Hacking It as Well

This is the latest artificial intelligence Models aren’t the only thing that you can buy remarkably good at software engineering—new research shows they are getting ever-better at finding bugs in software, too.

AI researchers from UC Berkeley assessed how effectively the latest AI agents and models could detect vulnerabilities in 188 massive open source codebases. By using a new benchmark Call us today to learn more about our services. CyberGymThe AI models detected 17 new bugs, including 15 that were previously unknown. “zero-day,” ones. “Many of these vulnerabilities are critical,” Dawn Song, a UC Berkeley professor who headed the project.

AI is expected to be a formidable weapon in the fight against cybercrime. A new AI tool by startup Xbow is currently available. has crept up the ranks of HackerOneThe leaderboard of bug-hunting is currently atop the list. New funding of $75 million was announced by the company.

Song says the combination of improved reasoning skills and the new AI models’ coding abilities is starting to transform the cyber landscape. “This is a pivotal moment,” She says “It actually exceeded our general expectations.”

Models continue to be improved as they become more sophisticated. will automate the process of both discovering and exploiting security flaws. It could be a way to help businesses keep their software secure, but it may also make hacking easier. “We didn’t even try that hard,” Song says “If we ramped up on the budget, allowed the agents to run for longer, they could do even better.”

The UC Berkeley team used open-source AI offerings such as Meta, DeepSeek and Alibaba, along with conventional frontier AI models, to test for bugs. OpenHands, Cybench” EnIGMA.

They used the descriptions from 188 different software projects to identify known vulnerabilities. After that, they fed these descriptions to frontier AI-powered cybersecurity agents to determine if those agents were able to identify similar flaws by analyzing codebases in new projects, performing tests and creating proof-of concept exploits. They also told the agents to look for vulnerabilities on their own in codebases.

The AI tools created hundreds of exploits as proof-of concept. Researchers identified two previously patched vulnerabilities, 15 that were previously undiscovered. It adds more evidence to the growing body of research that AI is capable of automating the discovery and patching zero-day security vulnerabilities. These are dangerous, as they could be used by hackers to compromise live systems.

AI will continue to play a significant role in the cyber industry. Sean Heelan, a security expert recently discovered OpenAI’s reasoning engine o3 helped to find a zero day flaw in Linux, a widely-used kernel. Google, in November last year announced Project Zero is a computer program that uses artificial intelligence to discover previously unknown vulnerabilities in software.

Many cybersecurity companies are also enamoured by the AI potential, just like other sectors of the software sector. This new research shows AI’s ability to find flaws. However, it also highlights the limitations of this technology. Most flaws were not detected by the AI system, and even complex ones were missed.

AI Agents Are Getting Better at Writing Code—and Hacking It as Well

Jeffrey Epstein Had a ‘Personal Hacker,’ Informant Claims

I Let Google’s ‘Auto Browse’ AI Agent Take Over Chrome. It didn’t quite click

‘Uncanny Valley’: Minneapolis Misinformation, TikTok’s New Owners, and Moltbot Hype

A Yann LeCun–Linked Startup Charts a New Path to AGI

People Are Protesting Data Centers—but Embracing the Factories That Supply Them

Anthropic settles high-profile AI copyright lawsuit brought by book authors

Carl Pei believes that the phone of the future will only have one app

Age Verification Is Sweeping Gaming. Are you ready for AI Fakes in the Age of Gaming?

Some Democrats believe that AI will help the party win elections

Top Insights

YouTube opens ‘second chance’ program to creators banned for misinformation

GitHub Copilot SDK allows you to embed its agentic runtime within any application

Latest News

NVIDIA AI brings Nemotron-3 Nano-30B to NVFP4 using Quantization Aware Distillation for Efficient Inference

How to Create AI Agents that Use Short-Term Memory, Long-Term Memory, and Episodic memory

AI Agents Are Getting Better at Writing Code—and Hacking It as Well

Related Posts