the strange loop of metalearning in language models

# the strange loop of metalearning in language models - the loop aka the snake eating its own tail aka uroboros - ai generating synthetic text - new models trained on prevalently synthetic data - old models feed new ones, enabling smth akin to [[lamarckian inheritance]] -- and this has been happening for years - the jury is still out on whether this leads us to model collapse or capability explosion - learning happens on multiple levels - metaprompting for quick bot making aka rewriting prompts for the user request - evolutionary strategies for prompt optimization over many generation, iteratively evaluating and improving the instructions - RLVR and specialized reasoners to engrain complex interactions in model weights - requires expanding the definition of verifiable domains aka `let's use this complex custom-built solution to generate some data until this soluton in evaluation mode can serve better than any expert human, then use it llm-judge style to serve as a verifier to RL policy` - reasoners are great but now nearly precise enough for any complex tasks, a specialized @decision.maker will be better than any out of the box model -- when knew models come out, it needs to be retuned - automated design of agentic systems -- emerging field of agents building agents - first papers mostly focused on metaprompting (promptbreeder) or light evolutionary approaches (eponymous adas paper, darwin gödel machines, alphaevolve) - results mixed, benchmark improvements are shown but practical applications are unclear. moreover, improvements are eaten up by new model releases like gemini 3 (writing this on 21 nov 2025) - os complete navigation environments - extrapolate to a work computer, with email, calendar, erp, spreadsheets and other tools -- lot of work needs to be done to enable these agents to act robustly in diverse environments - think of agentic benchmarks - that's what they do in limited scope, giving agents an env and a task - inspired by jeff clune's idea of "darwin complete environments" for RL agents - automated AI ops to deploy, run, observe, evaluate AI systems - neural architecture search to create new model architectures, train, deploy them, end evaluate against benchmarks established previously - when closed, each of these levels itself can be improved in a loop