All posts
By · llmsengineering

Most of AI engineering is glue

Every time I ship something with an LLM in it, I end up writing less AI code than I expected and more plumbing than I planned for.

Thick impasto oil painting with ochre and indigo brushstrokes

I opened a pull request this morning for a feature that took me three days. The PR has 640 lines changed. Forty of them call the LLM. The other 600 are the reason the thing actually works in production.

That ratio is not unusual. That ratio is basically every AI feature I have ever shipped, and it’s been gnawing at me how different it is from the way “AI engineering” gets talked about online. So I want to write down the ratio, why it refuses to budge, and what it means if you’re hiring or applying for these jobs.

The feature I shipped this morning

User pastes in a long piece of text. The app summarises it, pulls out the entities, assigns a category, stores the whole thing. Simple product, one screen, one input.

The LLM code is forty lines. A prompt, a JSON schema, a retry loop, a parse, some small validation. That part took me about ninety minutes, most of which was wording the prompt.

The other 600 lines are, and I’m going to list them so the shape is visible: authentication, rate limiting per user, the job queue, the worker that pulls from the queue, the storage schema, idempotency keys so retries don’t double-charge the user, per-request cost tracking, error states for when the text is too long or the model times out, the admin tool I built for myself to inspect runs that went sideways, the tiny dashboard that tells me whether today’s cost is tracking where it should, tests for the non-LLM parts, and observability hooks for when something goes wrong at 3am.

Nothing in that list is “AI.” All of it is how software has always worked. The AI is a function call in the middle.

Why the ratio never inverts

When I started shipping LLM features four years ago, I assumed the AI layer would grow over time and the plumbing would shrink. The opposite has happened, consistently.

Every time a better model comes out, I delete AI code. The GPT-3-era version of the feature I shipped this morning had custom chunking, a reranker, a manual eval rig, three different prompts that got composed together. The modern version is five lines of structured output, one prompt, one schema, done. The model got reliable enough that the complexity moved out of my codebase.

But it didn’t disappear. It moved into the surrounding system. When the model becomes reliable, the bottleneck becomes everything else. How you queue the work. How you recover from failures. How you let users see what’s happening. How you don’t burn through your API budget in a week because one user found a way to paste in a 400k token input.

None of that is AI. It’s just engineering. And because the AI layer keeps getting simpler, the glue layer keeps getting proportionally bigger. Better models make the ratio worse, not better, if you measure “worse” as “percentage of code that is actually about the model.”

The moat is in the 600 lines

This is the part I want people to sit with, because it’s the part that changes how you should spend your time.

The forty lines that call the LLM? Anyone can write those. They’re a copy of the API docs. The moat of an AI product isn’t in the prompt. It’s in the idempotency, the cost controls, the error UX, the queue, the admin tooling. The stuff that makes the product not fall over when real users hit it at real scale.

A competitor who studies your prompt can replicate the prompt in an afternoon. A competitor who has to replicate your whole production system — the retry logic, the cost guardrails, the admin UI you built so your support team can answer a refund email — is going to take months, and might not even bother. The glue is where the durable advantage lives.

If you understand this, you spend less time tinkering with prompts and more time building the boring infrastructure. Which is exactly backwards from the Twitter version of the job.

Who you should hire (and who you are)

Every month I see roles posted for “AI engineer” where, if you read the actual job description, 85% of the work is queues, databases, retries, dashboards, and frontends. And then the company is surprised when the research-heavy candidate they hire is disappointed by the job, and the company is disappointed by the candidate.

If you’re hiring to ship AI products, you are hiring a product engineer who happens to be comfortable calling a model. Not a researcher. Not a fine-tuning specialist. Not someone whose last project was comparing chunking strategies on a 200-page PDF.

I’d rather hire an engineer who has taken one boring SaaS from zero to a thousand users than an engineer who has fine-tuned five models but never deployed anything. The former will pick up the prompt layer in a week. The latter will spend six months figuring out why their jobs aren’t getting processed, and by the time they solve it, the feature will have missed its launch.

The flip side is also worth saying out loud: if you’re applying for AI engineer roles and you came from a non-AI background, you are massively underestimating how much of the job is stuff you already know how to do. Apply.

What this isn’t

I want to say this before anyone reaches for the comments.

I’m not saying the LLM layer is trivial. Getting a prompt to be reliable across real user inputs is genuinely hard. Evals are genuinely hard. Latency-cost-quality trade-offs are genuinely hard. If you’re building a product where the model is the product — a coding agent, say, or a legal-search engine — that ratio tips much further toward AI work than what I’m describing.

But for the majority of “AI features” that are shipping inside normal software products right now, the ratio is 10% model and 90% everything else. The sooner you embrace that the sooner you ship, and the sooner you ship the sooner you find out whether the model actually solved the problem you thought it did.

Learn the 10%. Learn the 90%. People who only know the 10% can make a prototype. People who know both can ship a product.