AI Improves At Improving Itself Using an Evolutionary Trick
Technology writer Matthew Hutson (also Slashdot reader #1,467,653) looks at a new kind of self-improving AI coding system. It rewrites its own code based on empirical evidence of what's helping — as described in a recent preprint on arXiv.
From Hutson's new article in IEEE Spectrum: A Darwin Gödel Machine (or DGM) starts with a coding agent that can read, write, and execute code, leveraging an LLM for the reading and writing. Then it applies an evolutionary algorithm to create many new agents. In each iteration, the DGM picks one agent from the population and instructs the LLM to create one change to improve the agent's coding ability [by creating "a new, interesting, version of the sampled agent"]. LLMs have something like intuition about what might help, because they're trained on lots of human code. What results is guided evolution, somewhere between random mutation and provably useful enhancement. The DGM then tests the new agent on a coding benchmark, scoring its ability to solve programming challenges...
The researchers ran a DGM for 80 iterations using a coding benchmark called SWE-bench, and ran one for 80 iterations using a benchmark called Polyglot. Agents' scores improved on SWE-bench from 20 percent to 50 percent, and on Polyglot from 14 percent to 31 percent. "We were actually really surprised that the coding agent could write such complicated code by itself," said Jenny Zhang, a computer scientist at the University of British Columbia and the paper's lead author. "It could edit multiple files, create new files, and create really complicated systems."
... One concern with both evolutionary search and self-improving systems — and especially their combination, as in DGM — is safety. Agents might become uninterpretable or misaligned with human directives. So Zhang and her collaborators added guardrails. They kept the DGMs in sandboxes without access to the Internet or an operating system, and they logged and reviewed all code changes. They suggest that in the future, they could even reward AI for making itself more interpretable and aligned. (In the study, they found that agents falsely reported using certain tools, so they created a DGM that rewarded agents for not making things up, partially alleviating the problem. One agent, however, hacked the method that tracked whether it was making things up.)
As the article puts it, the agents' improvements compounded "as they improved themselves at improving themselves..."