expect ai magic to continue

# expect ai magic to continue magic of ai is akin to how we imagined it for centuries say a few words, and smth gets done without you ever touching it in the ai industry, we don't cast spells, we write prompts, and the system often misinterpret our instructions, but once we get the wording right, we tend to be able to re-use those prompts repeatedly. moreover, often we can create a new spell just by changing several key words, like the topic, the task, the goal - and get a similarly qualitative result in a new domain prompting works because language models are able to interpret our words and execute in them - albeit still being simple [stochastic parrots](https://dl.acm.org/doi/10.1145/3442188.3445922), which only work thanks to their massively complex probability distribution that can be conditioned with simple token sequences (and not only weight updates) we call this capability in-context learning and to me, it is the most magical of all. as is typical with ai (and some other advanced technology), we quickly dismiss its groundbreaking character and complain about all the times it gets things wrong - forgetting all the uncounted times it did get them right. ai is whatever hasn't been done yet - after that it is just software, [larry](https://en.wikipedia.org/wiki/AI_effect#Definition)? ## in-context learning is magical in-context learning is magical because it allows excruciatingly customized behaviors at a tiniest fraction of the cost - compare the cost of a training run of $5 to 100 million, fine tuning job of 50 to 5000, and an icl query of 0.05 cents to 0.5 dollar. while not as robust as pretraining or fine tuning, icl is cheap, fast, easy (just write a message, no gpus, no coding, no phd required) - and use it as a prototype, precursor that generates highly custom traces that can be used for fine tuning, or with enough scale, for pretraining (we leave the existential question of model collapse aside for a second) also, as transformers were made, no one expected or hard coded icl capability. the original [attention is all you need](https://arxiv.org/abs/1706.03762) paper was about machine translation. only later did the ai community found out that transformers are universal machines and can do a variety of tasks, see [universal transformers](https://arxiv.org/abs/1807.03819), [computational universality of llms](https://arxiv.org/abs/2301.04589), and [turing completeness of prompting](https://arxiv.org/abs/2411.01992). thus, icl is an emergent phenomenon, a serendipitous discovery that we didn't know we needed. emergence of in-context learning and it making language models promptable are the magic sauce but it's old news from late 2022, who cares? we got used to prompting, some say it's here to stay, some say people won't prompt in 5 years, and i believe both statements are true people won't prompt models by hand but models will instruct models it requires parsing, interpreting, running prompts on the receiver side it requires generation, validation, evaluation, reflection (metathinking / metaprompting / metalearning) on the sender side as well as a protocol for communication, and a new kind of language, an in-context language, to enable such reflective, multileveled aka nested metacommunication. i do not think that attempts by [google a2a](https://github.com/google/A2A), [langchain](https://github.com/langchain-ai/agent-protocol), [ibm](https://workos.com/blog/ibm-agent-communication-protocol-acp) or the old one by [e2b](https://e2b.dev/blog/agent-protocol-developers-community-setting-a-new-standard) will cut it i find all of those exciting but in this article, i wanted to focus on smth else we just witnessed a new emergent phenomenon on the scale of icl and no one cares ## let's talk about reasoning what? you crazy, haven't you seen the deepseek hype, their app placed first, openai giving o3 mini away for free, perplexity, poe, together, cursor integrations of r1, open r1, tinyzero, gemini fast thinking then 2.5 pro, sonnet37 reasoning, deep research with o3, qwq, f1, who can count them all? or care to link all repos ... most notable, andrej karpathy gave a stunning 3 hour lecture on building llms and explained in detail how reinforcement learning works - i read about it in the deepseek paper but didn't appreciate the utter importance of it, and from all the above releases and peoples reactions, I feel like not many folks have been struck by this insight either, so here it is reasoning traces emerge during reinforcement learning training that's it reasoning ability is emergent in language models reasoning is emergent reasoning is magic how else should i frame it? focus! just like icl, reasoning is an emergent ability that occurs naturally in language models, and those abilities seem to depend on the data and training mechanism - eg, all Internet data + self supervised learning --> assistants, OR chain of thought traces + reinforcement learning --> reasoners next token prediction weren't expected to deliver emergence but they did, and rl is long known for enabling emergence and true creativity, as exemplified by the old tale of the move 37 by alphago, and now - by reasoners so I am asking, what if we feed r1 or o3 responses to rl a base model? or use o3 responses to fine tune r1? or construct a new approach to data generation, placing focus on metathinking, self-prompting, what if we don't use standard llms as judges, but train new models as evaluators, what if we create new simulation environments to expand verifiable domains, and run rl jobs on those new traces, combining a set of lms into this learning loop? all while we know that it doesn't cost 100 million anymore, but maybe 5 for a base model, and tinyzero costs $300, and s1 took 26 minutes on 8 h10p and just 1000 traces to fine tune not even mentioning that transformers likely aren't the endgame of neural architecture, as it has no memory, no ontology, no world model (see [lecunn's critique](https://www.youtube.com/watch?v=ETZfkkv6V7Y&t=17s), which i don't fully agree with) so here's my recipe to cook the magic spell - experiment with icl to generate traces - universal, deep, unique - fine tune a base model with rl on those traces for a new capability beyond reasoning, eg, planning, metathinking, code architecture, in-context language use - repeat 10 times for making 10 tunes - create a unique model zoo - that you can sell as a product, use in your organization for tailor made and highly specialized ai expect the magic and it will inevitably emerge as emergence is just another capability of language models welcome to the era of tunes --- // 9 may 2025 #in-context_learning #training #reinforcement_learning #predictions