# the strange loop of metalearning in language models
- the loop aka the snake eating its own tail aka uroboros
- ai generating synthetic text
- new models trained on prevalently synthetic data
- old models feed new ones, enabling smth akin to [[lamarckian inheritance]] -- and this has been happening for years
- the jury is still out on whether this leads us to model collapse or capability explosion
- learning happens on multiple levels
- metaprompting for quick bot making aka rewriting prompts for the user request
- evolutionary strategies for prompt optimization over many generation, iteratively evaluating and improving the instructions
- RLVR and specialized reasoners to engrain complex interactions in model weights
- requires expanding the definition of verifiable domains aka `let's use this complex custom-built solution to generate some data until this soluton in evaluation mode can serve better than any expert human, then use it llm-judge style to serve as a verifier to RL policy`
- reasoners are great but now nearly precise enough for any complex tasks, a specialized @decision.maker will be better than any out of the box model -- when knew models come out, it needs to be retuned
- automated design of agentic systems -- emerging field of agents building agents
- first papers mostly focused on metaprompting (promptbreeder) or light evolutionary approaches (eponymous adas paper, darwin gödel machines, alphaevolve)
- results mixed, benchmark improvements are shown but practical applications are unclear. moreover, improvements are eaten up by new model releases like gemini 3 (writing this on 21 nov 2025)
- os complete navigation environments
- extrapolate to a work computer, with email, calendar, erp, spreadsheets and other tools -- lot of work needs to be done to enable these agents to act robustly in diverse environments
- think of agentic benchmarks - that's what they do in limited scope, giving agents an env and a task
- inspired by jeff clune's idea of "darwin complete environments" for RL agents
- automated AI ops to deploy, run, observe, evaluate AI systems
- neural architecture search to create new model architectures, train, deploy them, end evaluate against benchmarks established previously
- when closed, each of these levels itself can be improved in a loop