building ai agents for a friend

# building ai agents (for a friend) a friend of mine got a claude code enterprise license and was asking how he can set up an agent to help him work in finance. below is a slightly cleaned up transcription of the five audio messages I sent to him, plus the contents: - start with plugins for claude - why most of it is garbage - problems: - process specifics - integrations - prototype is not the product - summary ## where to start anthropic released a [bunch of financial agents](https://www.anthropic.com/news/finance-agents) just today. you can take a look and see which ones suit you. people right now, if they're working with claude code or something like that, they often install these agents or skills. usually, these are repositories on github where people have prepared markdown files. [here is anthropic's repo](https://github.com/anthropics/financial-services). ![[Pasted image 20260506155544.png]] ```sh # Add the marketplace claude plugin marketplace add anthropics/claude-for-financial-services # Core skills + connectors (install first) claude plugin install financial-analysis@claude-for-financial-services ``` i'd recommend you begin with that anthropic link, install a few of those things, and see how they work for you. you can either install these prompts from anthropic, or just search online -- it's called "plugins" or "plugin marketplaces for finance" -- claude code, something like that. with the expectation that you'll probably need to install 3–5 different things, and 4.5 of them will be total garbage, but they might give you some starting point. ## why most of it is garbage the problem is that, usually, these markdown files are complete garbage, because people generate them automatically or semi-automatically. in probably 99% of cases, they don't review them, they don't even test them on real documents -- they just go like, "oh, it would be cool to make an ai cfo. hey claude, make me an ai cfo and all of the use cases and skills for them, bro." that kind of style. as you can imagine, from a prompt like that it doesn't create something -- well, ai always creates _something_, but most of it will be unusable. ## what usually happens next you look at these prompts, at these skills, and you're like, "damn, this doesn't work for me." and you rewrite them. you install the repository, and then you need to go into the repository itself, open the folder, and, for example, open claude code in it and tell claude code: "look, these are skills for you, but i don't like them, let's rework them." and then you describe your process -- how you work, what files you need, what integrations you need. and you work on it, refine it. because out-of-the-box, nothing ever works. that's all bullshit, it's junk, it's slag. if someone tells you these ai things work out-of-the-box -- no, they don't. they work for demos, but real company workflows are always very individual. ## problem one: process specificity your biggest problems -- well, two main problems with use cases like yours. the first is the specificity of the process, and the second is integration. here's an example. i recently built myself a few skill packages. roughly speaking, one for evaluating developer performance, because honestly they've lost it. and i created skills there -- like the first one, for example, collects data. it's called "data collection," this skill. and this skill is very specific. it points to my repository. it downloads the github activity log of developers. i compile timesheets manually into a folder. it looks at the timesheets. it converts everything, brings it to a single format -- json, a machine-readable format. and then it writes scripts for that format and analyzes it. in this data collection step, there are no scripts yet -- it just gathers info. then the second aspect analyzes commit history, lines of code, stuff like that. technical metrics -- how many times a developer pushed code to the repository, how many lines of that code, whether there's garbage in those lines of code, certain rules for filtering. and these rules are very specific. one of the rules was that i had to filter out for one of the developers: in december, he had 36,000 lines of code. i looked at it and thought, "what the hell, in december they had just barely started working. where did all this code come from?" and then i remembered -- it was just a prototype that i had built, a visual front-end prototype from that product that had almost 36,000 lines of code. i built it, i published it, and then this developer simply transferred it from the main organization where i had published it to the client's organization. so he didn't write a single line, and that's one of the rules that's also in these skills, because i need to filter that out. then the third skill file compares commits with timesheets, and this file asks the agent to semantically compare -- meaning the agent has to read the text from timesheets, how they're described, how commits are described, which files were created, which functionality was built, and compare whether they correspond to each other. the next file is semantic code comparison -- the next skill that says to now look at the code and actually check whether what's described in the description actually exists in the code or not. in short, the process has to be very detailed and very specific. actually, it's not even a problem exactly -- it's relatively easy -- you can do it because you know your process. ## problem two: integrations the second issue with integrations is a bit more complex. ![[Pasted image 20260506155639.png]] why is it more complex? well, because there's real data involved. real integrations, real apis. and they don't always exist. i don't know if there even exists, for example, an automated api integration and a so-called mcp server for your system. you've probably heard about mcp too -- model context protocol. it's essentially these apis. well, apis are used for exchanging data between two software products, programs. mcp is like a wrapper around it -- a wrapper that explains to the agent what to do with it. for claude code, many of these mcp servers exist. like salesforce, for instance -- if you had that, it would be fairly easy to find the mcp server, and it would probably work decently. for an on-prem system, this could be problematic. i don't even know if it has an api. like, what api, where is it, how to reach it, how to authenticate, how to do this in an automated way through an agent. this is where software development starts. in most cases, because something either doesn't exist, or doesn't work properly, or doesn't authenticate with the required scope of permissions, and so on. or it can't pull sufficiently large volumes of data, and so on and so forth. this is where the pain begins. because if the integration exists, it's fairly simple. if it doesn't exist -- formally speaking, many of these integrations don't actually have a lot of code, but they require a ton of testing, a ton of edge cases, so you can easily spend 2–4 weeks full-time building one integration. why would you spend so much freaking time for a little module? well, let's talk about the difference between a prototype and a product, how vastly apart and even dissimilar they are. ## problem three: from prototype to product, the business central example i recently built a prototype agent for a client -- for business central business intelligence (call it bcbi). about a month ago we were chatting, and he said, "man, why isn't there an ai that can just tell me who my three biggest clients were last month?" or like, "which of my invoices are delayed, unpaid?" i was like, "damn, that's actually cool." and i built a prototype agent. fundamentally it's similar to what i described to you. there are also prompts and so on. but it's one or two levels more complex. and this is where the big difference lies -- between building some little prototype that half-works and requires a lot of babysitting from you, and building a product. building a product that you can hand off to a client -- in this case, to yourself -- so that it actually saves you time. ### difference one: a dedicated agent in this business central business intelligence thing, there's a separately coded -- not claude code, but through the api, anthropic's api -- a separately coded, separately prompted, specifically prompted agent. that's the first big difference. it exists inside the product, meaning it's not a separate claude code instance. what is the difference or the benefit? there are plenty: 1. functionally, you want your agents to be as narrow in scope and specific as possible. they work faster because they need to explore less. they're cheaper -- because they need to explore less! quality of the answers can be positively shaped and dynamic through agent bonds. 2. the next reason is interface. claude code seems to be everywhere, but developers somehow forget that not everybody is a developer. most people will never work with a terminal, with a custom agent, you can build any ui. for example, you can connect it via email. 3. experience & loops or how the user interacts and what it gets from the agent. in claude code, you're completely at the mercy of anthropic, their harness, and their logic. when using the api, it's much easier to switch. if you want certain phrases or commands to not just generate a chat response that triggers a sequence of five skills intertwined with iterative critique, reviews, and web research, you just do that. ### difference two: deterministic tool functions why does it work this way? because this agent has about 50 different specific tools. and these specific tools are mostly statistical functions. because if you say, "who was my biggest client in the last three months," for example -- what do you practically need to do? ![[Pasted image 20260506155736.png]] you need to send a request for all your sales orders from the last three months. that's request number one. second, you need to aggregate them -- meaning combine them, group by clients. and third, you need to sort them. strictly speaking, those are three different data processing functions. and all these functions need to be defined, need to be described somehow. why do they need to be described? if you don't describe them, your agent does whatever it wants. and today, for the query "who are my three biggest clients," it gives you one answer. tomorrow, maybe a different one. the day after, maybe a third one. then the day after that, it goes back to the first one. basically, they do it anyway -- they generate these functions or something similar in the background on their own. but they lack robustness. this is what i was talking about -- going from a prototype to an actual product. you need to be able to rely on your product, on what it does. if you can't -- especially in finance -- then you're screwed. another important aspect here is that it's entirely possible that certain aggregation functions, groupings, pivots, or whatever else will be very specific to you and to your companies. like, if a company calculates some adjusted eps -- what "adjusted" means, you understand better than i do -- in every company it'll be something different. meaning very custom formulas, and the agent also has to follow these formulas. this means these formulas need to be written not in the prompt, but in code. and that's why in this business central business intelligence agent, i have about 50 different functions written in code. and the agent doesn't reinvent the wheel each time -- it just takes them and executes them deterministically. ### difference three: a knowledge base this agent has a knowledge base. meaning i scraped, collected information about this business central api: what the objects look like, how to reach them, what fields they have, what they can do, what they can't do, and shoved it all into a small local vector database owned by this agent. this is also, on one hand, nothing new. on the other hand, it's a very small, very focused database that contains relevant, fresh information about how the api works. on top of that, i actually built two versions of this product. the first one i built on my own test tenant for business central, and the second one i built for the client. and the client's api endpoints and documentation were slightly different, which is also very typical. especially for on-prem solutions -- the endpoints you send requests to, the format of objects. all of this stuff will differ from the official documentation -- custom fields, custom tables. all of this also needs to be in the agent's knowledge. and specifically, this information lives in this knowledge database, and also partly in the prompts, in these skills that describe the agent's behavior. ## summary to start, i'd recommend you begin with that anthropic link, install a few of those things, and see how they work for you. and then start gradually reworking them. all that stuff about custom functions, knowledge base, scraping, customer-specific endpoints -- all of that is important and necessary, but probably not in the first step. have fun! --- // 6 may 2026, berlin