12-factor Agents: Patterns of reliable LLM applications

Posted by dhorthy 4 days ago

12-factor Agents: Patterns of reliable LLM applications(github.com)

I've been building AI agents for a while. After trying every framework out there and talking to many founders building with AI, I've noticed something interesting: most "AI Agents" that make it to production aren't actually that agentic. The best ones are mostly just well-engineered software with LLMs sprinkled in at key points.

So I set out to document what I've learned about building production-grade AI systems: https://github.com/humanlayer/12-factor-agents. It's a set of principles for building LLM-powered software that's reliable enough to put in the hands of production customers.

In the spirit of Heroku's 12 Factor Apps (https://12factor.net/), these principles focus on the engineering practices that make LLM applications more reliable, scalable, and maintainable. Even as models get exponentially more powerful, these core techniques will remain valuable.

I've seen many SaaS builders try to pivot towards AI by building greenfield new projects on agent frameworks, only to find that they couldn't get things past the 70-80% reliability bar with out-of-the-box tools. The ones that did succeed tended to take small, modular concepts from agent building, and incorporate them into their existing product, rather than starting from scratch.

The full guide goes into detail on each principle with examples and patterns to follow. I've seen these practices work well in production systems handling real user traffic.

I'm sharing this as a starting point—the field is moving quickly so these principles will evolve. I welcome your feedback and contributions to help figure out what "production grade" means for AI systems!

454 points | 75 commentspage 2

mettamage 2 days ago|

I've noticed some of these factors myself as well. I'd love to build more AI applications like this. Currently I'm a data analyst and they don't fully appreciate that I can build stuff like this as it is not a technology oriented company.

I'd love to work on stuff like this full-time. If anyone is interested in a chat, my email is on my profile (US/EU).

dhorthy 2 days ago|

cool thing about open source is you can work on whatever you want, and it’s the best way to meet people who do similar work for their day job as well

DebtDeflation 3 days ago||

> most "AI Agents" that make it to production aren't actually that agentic. The best ones are mostly just well-engineered software with LLMs sprinkled in at key points

I've been saying that forever, and I think that anyone who actually implements AI in an enterprise context has come to the same conclusion. Using the Anthropic vernacular, AI "workflows" are the solution 90% of the time and AI "agents" maybe 10%. But everyone wants the shiny new object on their CV and the LLM vendors want to bias the market in that direction because running LLMs in a loop drives token consumption through the roof.

film42 3 days ago||

Everyone wants to go the agent route until the agent messes up once after working 99 times in a row. "Why did it make a silly mistake?" We don't know. "Well, let's put a few more guard rails around it." Sounds good... back to "workflows."

film42 3 days ago|||

"But what about having another agent that quality controls your first agent?"

You should watch the CDO-squared scene from the Big Short again.

dhorthy 3 days ago||

THIS so much. People are like "why human supervision when we can have agent supervsion" and always respond

> look if you don't trust the LLM to make the thing right in the first place, how are you gonna PROBABLY THE SAME LLM to fix it?

yes I know multiple passes improves performance, but it doesn't guarantee anything. for a lot of tool you might wanna call, 90% or even 99% accuracy isn't enough

dhorthy 3 days ago|||

Yup

daxfohl 3 days ago|||

I think it got started as AI tools for things like cancer detection based purely on deep learning started to outperform tools where humans guide the models what to look for. The expectation became that eventually this will happen for LLM agents too if only we can add more horsepower. But it seems like we've hit a bit of a ceiling there. The latest releases from OpenAI and Meta were largely duds despite their size, still very far from anything you'd trust for anything important, and there's nothing left to add to their training corpus that isn't already there.

Of course a new breakthrough could happen any day and get through that ceiling. Or "common sense" may be something that's out of reach for a machine without life experience. Until that shakes out, I'd be reluctant to make any big bets on any AI-for-everything solutions.

musicale 1 day ago||

> Or "common sense" may be something that's out of reach for a machine without life experience

Maybe Doug Lenat's idea of a common sense knowledge base wasn't such a bad one.

peab 3 days ago||

I keep trying to tell my PM this

gusmally 3 days ago|||

I screenshot that comment to send to my PM.

nickenbank 2 days ago||

I totally agree with this. Most, if not all, frameworks or building agents are a waste of time

dhorthy 2 days ago|

this guy gets it

daxfohl 3 days ago||

Also, "Don't lay off half your engineering department and try to replace with LLMs"

dhorthy 3 days ago|

i would accept a PR to add this to the bonus section

daxfohl 3 days ago||

Haha, nah, I'd not want to devalue your repo by polluting it with silly HN snark.

silasb 3 days ago||

While not specific to 12factor question. With any of these agents and solutions how is LLM Ops being handled? Also, what's the testing strategy and how do I make sure that I don't cause regression?

dhorthy 3 days ago|

i try not to take a hard stance on any tool or framework - the idea is take control of the building blocks, and you can still bring most of the cool LLM ops / LLM observability techniques to bear.

I could see one of the twelve factors being around observability beyond just "whats the context" - that may be a good thing to incorporate for version 1.1

hellovai 2 days ago||

really cool to see BAML on here :) 100% align on so much of what you've said here. its really about treating LLMs as functions.

dhorthy 2 days ago|

excellent work on BAML and love it as a building block for agents

abhishek-iiit 2 days ago||

Really curious and excited to know the experience you faces at Heroku that led to the formulation of these 12 principles

AbhishekParmar 2 days ago||

would feel blessed if someone dropped something similar but for image generation agents. Been trying to build consistent image/video generation agents and god are they unreliable

sps44 3 days ago||

Very good and useful summary, thank you!

mertleee 4 days ago|

What are your favorite open source "frameworks" for agents?

jlaneve 3 days ago||

I've been most impressed with Pydantic AI [1], so much so that we ended up building an SDK around it specifically for LLM workflows on Airflow [2].

[1] https://ai.pydantic.dev

[2] https://github.com/astronomer/airflow-ai-sdk

dhorthy 3 days ago|||

i have seen a ton of good ones, and they all have ups and downs. I think rather than focusing on frameworks though, I'm trying to dig into what goes into them, and what's the tradeoff if you try to build most of it yourself instead

but since you asked, to name a few

- ts: mastra, gensx, vercel ai, many others! - python: crew, langgraph, many others!

shmoogy 3 days ago||

I'm currently using agno after seeing Google and OpenAI both chose pretty much the same syntax for their agent SDKs. So far so good

More comments...