# reverse engineering commit history to train reasoning models on process rewards i've been thinking about some ways to improve coding capability of language models and as i've been positively surprised by the quality of code of gemini 2.5 pro as well as its agentic capability inside a cursor, i noticed a few things quality of model code and its behavior improves when using cursor rules as defined by a few files with best practices, duh it also improves significantly if you don't simply type or type your notes into cursor. but if you have an intermediate step of metaprompting. i do this with my bot [prompter-3000](https://poe.com/prompter-3000x335) on poe that rewrites the prompts in a structured way, which, as we know, benefits language model understanding i am building a frontend app, which i don't really know how to do, and as i talk about my questions or concerns in my original prompts, sonnet 3.7 reasoning that is behind my prompter--3000, gives me very reasonable suggestions, which in turn helps gemini 2.5 pro in its coding. thus i use two language models instead of one gemini still often makes mistakes and the "restore checkpoint" mechanism doesn't always work as planned, especially after several lengthy messages because of that i tend to commit very often, more so than i would when i code by hand, which has brought me to another idea ** when ai labs train base models, they scrape millions of repositories and throw them into the training script using semi-supervised learning to remove tokens, predict tokens, calculate error - and repeat gazillion times that means they only work with static documents, final version of the repository, and they do not see how this repository has evolved. they do not see what were the requirements of the repository, how those were implemented, how they changed, and how this resulted in the change in the repository however, this information is available and thesource of it is the commit history especially for developers that commit often (that's why i thought about this now), commit history very clearly represents the evolution of a repository with commit messages, pull requests and comments, giving a lot of information about how it evolved. of course, pull request comments are also part of the training set, but what if we would reverse engineer the whole commit history? what if we would take the code committed to each hash and generate a prompt that would have generated that code, based on the repository state at the previous commit? in doing so, we can consider agentic capabilities, not just one-ff writing a function or changing a variable here and there, but taking into account that language models do have a capability of editing a dozen files, and models like gemini 2.5 can do so quite reliably - surprisingly so and thus, we would reverse engineer the commit history and interleave it with prompts. we would be able to model how a prompt sequence would look like as if written by a human during a normal work process - from the very first requirement and the corresponding prompt - to the very last change, generating code in the repository commit by commit (``chain of commits`` anyone?) thanks to ``git diff``, we could see which files and lines were changed and how, and generate text prompts that would have generated that change and then, having this intertwined, the evolving repository state with a static sequence of prompts, we could feed this not to the base model, but train a reasoning model with reinforcement learning with process rewards. in this case, it is possible to reward each step by - taking the repository state at a commit - running a prompt we just generated, and - comparing the result incl. unit tests + llm as a judge, with what the actual repository state was at the next commit thus, we could generate process reward verifiers using by reverse engineering commit history isn't that fun? would that actually work? curios to hear from you --- // 21 apr 2025 #ai_coding #training #reinforcement_learning