factories need bases

# factories need bases ![[Pasted image 20260416171619.png]] ## agent harnesses for some time now, we already know that AI agents need harnesses. used to be called agent scaffolding. it's the agent loop, its memory, its tools, the execution environment. this is where the LLM can use all those things to perform activities that it wouldn't be able to perform on its own. however, when we rise up a level of two, what do we see? how do we make software that makes software consistently, robustly? how do we industrialize it? that's an open question not many people have ever thought about. but those who are familiar with the concept of a software factory, sometimes called a dark software factory as a reference to real production facilities in Japan and China that have no lights at all. fully automated with robotics. what do we need to do to industrialize software factories? ## the factory base I call it a factory base. if you've ever played a real-time strategy game such as Command and Conquer or Warhammer or Warcraft, Dune 2, whatever, you know that you got to build a base that will spawn units and collect resources and help you advance in your game. similarly, there might be different ways to build a software product, variety of ways. well, it starts with the programming language selection and the libraries and the constraints and opportunities that come with that for ground for for brownfield projects though, it's not something you can choose, you know, it's something given that your software factory has to account for. so, in some sense there's either a ready asset that the factory has to operate upon, the existing repository, or there is some sort of specification or file or request or task that the human user gives to the factory to build something new if it's a Greenfield project. ## execution log a software factory should be able to handle both of those options. it needs to have an execution log. it needs to understand what has been done, what has not been done. it needs to have some sort of production plan. how does it proceed with building a software product? this plan needs to be adaptive. as we know from software development, iterative methodologies are very often very necessary to build anything meaningful and moreover, make it production ready. you don't do that in one shot. just don't. and if we know that from the human software development practice, this is general enough to be implemented in a software factory. so, execution log. ## workspaces we also need workspaces. the factory has to build its stuff somewhere and those workspaces, different tasks or groups of tasks need to be executed in parallel, sometimes makes sense to execute them in parallel just the same way as multiple humans work on a project. so, workspaces luckily named the same way by git. so, they are get workspaces. is something that every user needs to consider and to implement. ## artifact storage what else does a factory need? like a normal factory, not a software factory necessarily, it needs to store stuff somewhere. call it inventory. I call it artifact storage. it's something that current working documents of the factory are stored. not the code in this case because when you think about inventory it's not, obviously, the same thing that the factory produces. inventory is transformed into products, often in a very direct way, but if you're talking about software, your laptop is a piece of inventory. it doesn't directly translate into software in the sense that it's not a piece of text that is translated into code, which a prompt is, by the way. so your laptop is not a piece of code, not a piece of text, your cloud provider is not and your fingers that you're typing with and the voice that you're speaking into your transcription device with are not code, obviously, not inventory in the sense of a physical production facility, but are a kind of inventory. in case of a software factory, as we also know from agentic software development, this is something like the best practices and the start sequence and the history of previous sessions. that's what the factory builds and updates over time. I also call it living documents for simplicity. on one hand and also artifacts are kind of broader than those living documents necessarily. ## production line what else do we need is the production line. something needs to build that software. and we kind of also know what it is. it's a coding agent in a harness. not a pure LLM, but a coding agent that at least can generate code, execute code, see if it works. and also preferably is capable of other things like manipulating command line. maybe retrieving some history, running the scripts in an environment, web search, or research more generally, is helpful as a tool or skill for such an agent. but frankly, coding agent doesn't actually need that much because coding is a very well encapsulated activity. it is fairly independent of where you are, like what systems you're on and you don't need to integrate with that many tools that there's a fairly standard stack that if even if not every developer uses GitHub, let's say, people understand it. and it's very easy to replace GitHub with GitLab or Bitbucket or whatever you prefer to use to host your code. so, production system is needed. production system to me is again as I said, an coding an AI coding agent with the harness the tools like github projects or linear became popular recently are there as a part of the harness that combines human work with agent work. which is also important and also helpful and also is a kind of interface. is it strictly necessary for a factory? probably not. is it helpful to you know create a bridge between humans and the software agents. yeah, I would say so. ## quality assurance there are two points that I don't think many people talk about or don't look about it enough. one is quality assurance. yes, QA. QA in a broadest possible spectrum you can imagine because again we are talking about it from a perspective of a factory. it's not "just a QA" that we think about when we build products. it is more. because we build products with agents. our QA has to expand as well. it is testing for sure. it is verification which is not necessarily the same as testing because you know it could be in the form of a question answer game whatever. it's evaluations which is a broader thing than just verification and usually it involves LLM as judge scenario. some extraction evaluators, it could be a lot of different things. and a thing that recently appeared on the AI scene, a scenario. scenario testing, scenario analysis. we do use it not so much as I would wish to. basically, you write down a scenario the same way you write a skill for Claude or whatever, but from a user's perspective and you tell the AI agent to go around and use your app in some variety of ways and that is a scenario. I would suggest to keep them fairly granular. by nature they are closer to end-to-end tests, I would say, but by design I would rather try to keep them fairly small. for a simple reason, if you have a large markdown on file with 20 tasks, you can be almost certain that the agent will forget some of them, mix up some of them, not do some of them because it considers them too difficult to be executed in one prompt and one run. a lot of reasons, a lot of things that can happen here. but the the thing is that this QA mechanism for a software factory it needs to be much more expansive. I'll talk about a couple of things about how we design those software factories and I will talk much more about QA and evaluations in the future. this is a part that people do talk about, but by far not enough because the the quality assurance part is right now the main blocker or the main problem to solve. if you can solve this problem at scale, you will unlock how software factories work. ## communication and another bit that again, almost nobody's talking about that people do talk about, is communication. communication between AI agents, between groups of AI agents, teams or swarms, call them whatever you like. ### agent protocols the thing is agents need to communicate somehow, they need to talk, they need to leave messages unattended and get those messages picked up, you know. and there were attempts already. the very first was called [Agent Protocol](https://github.com/langchain-ai/agent-protocol) and it was built in 2023. I don't even remember by whom. I believe it was E2B and they gave it up to LangChain at some point, I don't really remember. there were a lot of different things. beyond that, there was some [protocol from IBM](https://www.ibm.com/think/topics/agent-communication-protocol), there was [agent to agent from Google](https://a2a-protocol.org/latest/), and probably half a dozen other that I don't remember anymore. the thing is nobody solved communication between agents because it is a complex topic because you have different levels of it. you have agents running in the same environment. then the easiest thing is often to write a markdown file and just leave it there. sounds stupid, but it does actually work quite well. what if your agent are not in the same machine though, not in the same environment. ### APIs, pub/sub, queues, systemd then it becomes more difficult. you need to have some sort of API communication. rest kind of isn't great. I believe some of those protocols did use rest APIs to communicate and it could probably work but rest is weird. like it's it's not really communication mechanism at the end of the day, you know something like pub/sub or a queue is a better communication mechanism and that's also the communication mechanism that I think we should use for our agents and for our factories, currently in our implementation, factories still talking to my document.md files. it is basic as fuck, not something we solved, something we as a community need to solve for sure. and I'm open to ideas. we do use um a database, a relational database SQLite for simulating a queue over which the factory talks to different agents and orchestrators over continuous periods of time, which is very important. continuous communication. you know, communication with a time component, with a time dimension is something that agents and factories need to master to be able to do long-running tasks. why people just try to press run on an agent or send and then expect it to work for for a 20 50 300 hours, I don't know. it could in some sense in a lot of ways if you model it as a *systemd* kind of process that you know a demon that restarts and re-runs itself you could do that. what's the point? #### SQLite comms & markdown coming back to the topic of communication, I was trying to say that we are kind of solving communication within an environment, within a base. this this SQLite is at the same time an execution log that I mentioned in the beginning. it is a tool for communication and it is a part that creates continuity in how the factory works, you know, iterative work and you know, tasks change, requirements change and products evolve and so on. the time component is important. that's how it works currently for us. the factories communicate with each other through marked on files that they can leave in specific parts of software repositories. so, those messages are also tracked and they they're version controlled. they can be committed to repositories. so, it maybe isn't that bad. it's definitely not real time. that's for sure. that's not a real time communication, nothing near to real time, but if you commit often enough and if your tasks are relatively independent, which is often the case when you work on two different repositories, it's kind of all right. I also don't necessarily see a problem of creating some sort of API that would write into the queue, into the execution log of another factory if that factory accepts the request. so, like it it it is definitely possible. I just don't know if it's the best mechanism. ## closing long story short, as we spoke about agents needing harnesses, I believe that software factories need bases, which consists of an execution log, of a workspace, of an artifact storage, of a quality insurance system and a communication system at the least. and I'm very happy about a discussion, about figuring out together how this is supposed to work because the fun part is that nobody knows and I find it exciting. see you soon. --- // 16 apr 2026, berlin