yes, the following is written by AI
yes, who cares, this is interesting research on a novel topic, read it, like i did
---
# Software Factory architecture for AI-assisted development
**The three-database architecture (SQLite + Qdrant + graph DB) grounded in Basili's Experience Factory is intellectually sound but operationally ambitious — no major AI coding tool has adopted this pattern, and for good reason.** The most successful tools (Claude Code, Cursor, Aider) favor radical simplicity: markdown files, vector stores, and Git itself as state. Yet the research literature strongly supports the core insight that separating "doing work" from "learning from work" creates compounding advantages. The critical question is not whether to build this architecture, but how to sequence it so complexity pays for itself at each stage. This report synthesizes findings across 11 academic papers, 30+ tools and databases, and the architectures of every major AI coding platform to provide actionable design guidance.
## Basili's vision meets modern infrastructure
Victor Basili's Experience Factory (1994) proposed something radical for its era: a **separate organizational unit** dedicated entirely to capturing, packaging, and reusing development experience. The project organization builds software; the Experience Factory analyzes project data and populates an "Experience Base" for future reuse. This operated through the Quality Improvement Paradigm (QIP) — a six-step cycle of characterize, set goals, choose process, execute, analyze, and package.
The concept proved itself at NASA's Software Engineering Laboratory, where **software reliability improved 35% over 15 years** across 250 engineers working on systems up to 1.5M lines of code. Fraunhofer IESE operationalized it using Case-Based Reasoning technology across 2,000+ customer projects. Daimler-Benz, Australian Telecom, and others followed. But a persistent failure pattern emerged: the "Catch-22 problem" where experience bases needed content to be useful, but developers wouldn't populate empty bases. Without continuous curation, bases died. The motivation gap between daily work and knowledge contribution proved nearly insurmountable with manual processes.
The Software Factory architecture directly addresses this failure mode through automation. Where Basili required humans to manually capture and package experience, process mining extracts patterns from existing development artifacts (commits, PRs, CI/CD logs) without developer effort. Where the original Experience Base was a static web repository, the tri-database architecture provides **semantic retrieval (vector DB), relationship traversal (graph DB), and structured state (SQLite)** — enabling AI agents to both contribute to and draw from the knowledge base as a natural byproduct of doing work. The 2001 follow-up paper by Basili, Lindvall, and Costa acknowledged that different knowledge types require different structuring approaches — essentially predicting the need for multiple specialized storage layers.
Two additional theoretical foundations strengthen the architecture. The **Transactive Memory Systems** literature (Chen et al., 2013) provides the framework for modeling "who knows what" in teams — specialization, credibility, and coordination dimensions that map directly to graph database schemas. **TeReKG** (2024) demonstrated that temporal knowledge graphs can model developer expertise and collaboration patterns using knowledge graph embeddings, enabling team configuration recommendations through link prediction. A 2023 systematic literature review of knowledge graphs in software engineering confirmed KGs are already proven for code recommendation, vulnerability detection, and knowledge integration across **100+ published studies**.
## Process mining closes the feedback loop — in theory
The process mining component of the Software Factory has the strongest empirical support. CodeSight (Ramos Soto et al., 2025, arXiv:2510.25935) demonstrated an end-to-end pipeline: GitHub event extraction → XES event log transformation → process discovery via Directly-Follows Graphs → LSTM prediction of PR resolution times. Across 835 cases and **271 process variants**, it achieved high precision for deadline compliance prediction. This is the closest existing blueprint for the Software Factory's mining layer.
Nogueira and Zenha-Rela's work (2021, 2024) established that Apache Kafka can integrate heterogeneous CI/CD event streams into normalized process mining logs in real-time. Their deployment at a major European e-commerce company revealed that each business unit followed substantially different deployment workflows despite sharing pipeline specifications — a finding that underscores why static process models fail and adaptive, mined models succeed. The practical pipeline from git data to process models follows a clear sequence:
- **PyDriller** extracts raw repository data (commits, authors, timestamps, diffs, complexity metrics) with 50% less code than alternatives
- Events are mapped to a case notion (typically PR lifecycle: first commit → review → merge)
- Activities are classified by type (feature, bugfix, refactor) through commit message parsing or NLP
- **PM4Py** (v2.7+, 1M+ downloads, AGPL-3.0) applies Inductive Mining for process discovery, conformance checking, and performance analysis
- Discovered models feed dashboards and predictive layers
However, **the full closed loop — mining → process variants → prompt templates → AI agent execution → new logs → mining again — has never been implemented for software development.** The closest analogs are Meta's KernelEvolve (treating kernel optimization as an agentic search loop) and Celonis's process mining → automation cycles for business processes. The critical missing piece is an **objective evaluation function** for software process quality. Manufacturing has defect rates; kernel optimization has benchmark scores. "Good software development" resists simple quantification. The architecture must define measurable proxies (cycle time, defect escape rate, rework frequency) before the loop can close.
## The tool landscape has consolidated around simplicity
A comprehensive survey of AI coding tool architectures reveals a striking pattern: **every successful tool has chosen simplicity over architectural sophistication**, even when backed by billions in funding.
**Claude Code** ($2.5B annualized revenue, 135,000 daily GitHub commits) uses a four-layer memory system that is the most sophisticated in the industry — but stores everything as **plain markdown files**. CLAUDE.md holds user instructions, MEMORY.md captures auto-generated insights, session memory provides cross-session recall, and "Auto Dream" periodically consolidates memories through a four-phase cycle (orient, gather, consolidate, prune). No databases whatsoever.
**Cursor** ($29.3B valuation) indexes codebases into a vector store and uses .cursorrules files for project conventions. Its 2.0 architecture supports up to 8 parallel agents via Git worktree isolation. **Windsurf** (now Cognition-owned) differentiates through "Flow Awareness" — tracking all user actions to infer intent — and autonomous memory generation that persists codebase knowledge across sessions. **Devin** runs as a cloud-based compound AI system with specialized planner, coder, critic, and browser agents, but no persistent learning mechanism beyond session state.
**OpenHands** (MIT license, 64K+ GitHub stars, ICLR 2025) uses **event-sourced state management** — the most architecturally interesting approach. State is a chronological event stream, enabling deterministic recovery and persistent context. Context condensation intelligently summarizes older interactions when conversations exceed token limits. **Aider** takes the most minimal approach: Git itself is the state store, with AST-based repository maps providing structural awareness.
The community fork **AiderDesk** adds LanceDB for persistent vector memory — the only example of a coding tool adding a dedicated database for memory. This suggests the market may be ready for database-backed memory, but the mainstream tools haven't committed to it yet.
## Database options after KuzuDB's archival
KuzuDB was **confirmed archived on October 10, 2025**, with the team posting only that they are "working on something new." The final release was v0.11.3. Three forks have emerged, each with distinct value propositions:
**LadybugDB** (777 stars, v0.15.2, led by ex-Facebook/Google engineer Arun Sharma) is the most actively developed community fork and the recommended general-purpose successor. It carries forward full KuzuDB capabilities with MIT licensing, broad language bindings (Python, Node.js, Rust, Java, Go, Swift, C++), and active development. The **Vela-Engineering/kuzu** fork specifically addresses the original's single-writer limitation with concurrent multi-writer support — critical for multi-agent architectures. Benchmarks show **374x faster** 2nd-degree path queries versus Neo4j. The Bighorn fork (Kineviz) has less traction.
For vector storage, the landscape stratifies clearly by use case. **LanceDB** is the strongest choice for local-first coding tools: it's the only embedded vector DB with a **native TypeScript SDK** (critical for VS Code extensions), runs truly in-process like SQLite, and is proven in production by Continue (IDE extension). **Qdrant** offers superior filtering (ACORN algorithm) and is moving toward edge deployment with Qdrant Edge (private beta), plus serves as Mem0's recommended vector backend. **ChromaDB** provides the simplest developer experience but lags in filtered search performance.
A potentially game-changing development: **SurrealDB 3.0 reached GA in February 2026**, combining graph, vector, relational, time-series, and key-value capabilities in a single embeddable Rust binary. It supports graph traversal with arrow syntax, built-in vector indexing, ACID transactions, and runs embedded, in-browser via WASM, or distributed. With $44M in funding and customers including Verizon, Walmart, and Nvidia, it represents a credible **single-database alternative** to the three-DB architecture. The tradeoff: BSL 1.1 licensing (converts to Apache 2.0 after 4 years) and relative youth as a v3.0 product.
For SQLite extensions, **sqlite-vec** (MIT, Mozilla-sponsored) adds vector search with zero dependencies, and **DuckDB's DuckPGQ extension** adds SQL/PGQ graph queries. These allow a "start simple" path where SQLite handles state, sqlite-vec handles vectors, and DuckPGQ handles basic graph queries — all upgrading to dedicated databases only when proven necessary.
## Agent memory frameworks bridge the gap
**Mem0** (48K GitHub stars, $24M Series A, ECAI 2025 paper) is the most directly applicable memory framework. Its architecture maps precisely to the tri-database vision: four-scope memory (user, session, agent, run), dual-phase processing (extraction → update with conflict resolution), and support for **19 vector backends plus Neo4j and Kuzu graph backends**. On the LOCOMO benchmark, it achieved a **26% accuracy improvement** over OpenAI's memory with **91% lower latency** and **90% token savings**. The OpenMemory MCP server enables local-first memory across Claude, ChatGPT, and other tools.
**Cognee** ($7.5M seed, Apache 2.0) takes a fundamentally different approach: building knowledge graphs from data through an Extract-Cognify-Load pipeline. Every inferred node links back to source documents for provenance tracking. It's better suited for building structured knowledge from codebases and documentation, while Mem0 excels at conversation-derived memory.
**Zep** differentiates through temporal knowledge graphs (via the open-source Graphiti engine, 20K stars). It tracks entity and relationship changes with **temporal validity windows** — when facts change, old ones are invalidated rather than simply appended. This is directly relevant for tracking evolving codebase architecture. The tradeoff: Zep Cloud is the only fully managed option; the Community Edition was discontinued.
**Letta** (formerly MemGPT) provides a full agent runtime with an OS-inspired memory hierarchy: core memory (always in-context, self-edited by the agent), recall memory (complete searchable history), and archival memory (indexed in vector/graph stores). Its unique approach — agents managing their own memory through tool calls — represents a different architectural philosophy than Mem0's externalized memory management.
## Concrete design recommendations and build sequence
### The graph database decision
**LadybugDB** is the recommended primary choice: MIT license, active development, broad language bindings, and the most community momentum. Use the **Vela-Engineering/kuzu** fork if concurrent multi-agent writes are a day-one requirement. Keep **SurrealDB 3.0** as a monitored alternative — if its v3 proves stable over the next 6 months, consolidating to a single multi-model database would dramatically reduce operational complexity.
### Three databases vs. simpler alternatives
The three-database architecture is defensible but should be built incrementally. The evidence suggests a staged approach:
- **Phase 1 (weeks 1-4)**: SQLite for state + sqlite-vec for basic vector search. This matches what Claude Code and Aider demonstrate: simplicity wins early. Use CLAUDE.md-style markdown memory as a baseline. Implement PyDriller extraction and basic PM4Py process discovery.
- **Phase 2 (months 2-3)**: Add a dedicated vector database (LanceDB for local-first, Qdrant for shared/team scenarios) when sqlite-vec's limitations become apparent (typically at >100K vectors or when filtered search quality matters). Integrate Mem0 as the memory management layer.
- **Phase 3 (months 4-6)**: Add graph database (LadybugDB) only after demonstrating that relationship queries provide retrieval quality improvements that vector similarity alone cannot. Use TeReKG's temporal knowledge graph patterns for developer expertise modeling. Implement conformance checking via PM4Py.
- **Phase 4 (months 6-12)**: Close the feedback loop. Start narrow — optimize PR review process only. Define measurable proxies (review turnaround, rework rate). Build the variant → prompt template mapping. This is uncharted territory with no published implementations.
### Critical risks and gaps
**Operational complexity** is the dominant risk. Every successful AI coding tool has chosen simpler architectures. Claude Code processes $2.5B in revenue through markdown files. The three-database approach must deliver measurably better outcomes than Claude Code's approach to justify its complexity — and this is unproven.
**The process mining feedback loop lacks an evaluation function.** Without a clear metric for "good development process," the loop cannot learn effectively. Cycle time, defect escape rate, and rework frequency are candidates but each has significant noise.
**PM4Py's AGPL-3.0 license** restricts commercial embedded use. A commercial license is available but adds cost. The architecture depends heavily on this library.
**Graph memory's value is context-dependent.** Mem0's own research shows graph memory (Mem0g) only outperforms base Mem0 for relational and temporal reasoning tasks — **not universally**. Social/collaboration contexts benefit; simple preference tracking doesn't. This means the graph layer's ROI depends entirely on how relationship-heavy the use case is.
**CRDTs for local-first sync add overhead.** CR-SQLite inserts are **2.5x slower** than regular SQLite. For a developer tool where responsiveness is paramount, this tradeoff must be carefully evaluated. Consider unidirectional sync (local → shared is append-only, shared → local is periodic pull) to avoid CRDT complexity entirely.
### How this compares to the state of the art
The Software Factory architecture's key differentiator is the **automated feedback loop** from process mining through knowledge representation to agent action. No existing tool has this. Cursor has codebase indexing but no learning. Claude Code has sophisticated memory but no process mining. Windsurf has autonomous memory generation but no graph-based knowledge representation. Devin has compound AI orchestration but no persistent cross-session learning beyond session state.
The closest production system is **Mem0 + Graphiti** (combining Mem0's vector memory with Zep's temporal knowledge graph via MCP), layered on top of an event-sourced agent runtime like **OpenHands**. This combination would provide: persistent memory (Mem0), temporal knowledge tracking (Graphiti), deterministic state recovery (OpenHands event sourcing), and process mining readiness (event streams → PM4Py). Building on these existing, proven components rather than from scratch dramatically reduces risk.
## The unexploited opportunity
The most significant finding across this entire research is that **nobody has modernized Basili's Experience Factory with modern AI.** The 1994 concept of structured experience capture and reuse maps naturally onto retrieval-augmented generation. The 2001 multiple-Experience-Bases refinement predicts the multi-database architecture. The NASA SEL demonstrated 35% reliability improvements through manual experience reuse — what could automated, AI-driven experience reuse achieve?
Mining open-source repositories via GHTorrent and GH Archive provides a massive bootstrapping dataset: commit sequences, PR lifecycles, and issue resolution patterns across millions of projects. The `lasaris/Git-logs-for-Process-Mining` dataset offers 23 pre-processed projects as an immediate starting point. The ethical path forward uses anonymized process patterns (not code content) for training development workflow models.
The architecture should be built incrementally, starting with the simplest viable memory system and adding complexity only when measured retrieval quality demands it. The process mining loop should begin as a passive observer (mining existing development data, generating dashboards) before attempting active intervention (generating prompts, routing tasks). And the entire system should be designed so that even if only Phase 1 ships, developers get value — because every successful tool in this space succeeded by being useful on day one, not by promising value after all components are complete.