12-factor Agents: Patterns of reliable LLM applications

Posted by dhorthy 4/15/2025

12-factor Agents: Patterns of reliable LLM applications(github.com)

I've been building AI agents for a while. After trying every framework out there and talking to many founders building with AI, I've noticed something interesting: most "AI Agents" that make it to production aren't actually that agentic. The best ones are mostly just well-engineered software with LLMs sprinkled in at key points.

So I set out to document what I've learned about building production-grade AI systems: https://github.com/humanlayer/12-factor-agents. It's a set of principles for building LLM-powered software that's reliable enough to put in the hands of production customers.

In the spirit of Heroku's 12 Factor Apps (https://12factor.net/), these principles focus on the engineering practices that make LLM applications more reliable, scalable, and maintainable. Even as models get exponentially more powerful, these core techniques will remain valuable.

I've seen many SaaS builders try to pivot towards AI by building greenfield new projects on agent frameworks, only to find that they couldn't get things past the 70-80% reliability bar with out-of-the-box tools. The ones that did succeed tended to take small, modular concepts from agent building, and incorporate them into their existing product, rather than starting from scratch.

The full guide goes into detail on each principle with examples and patterns to follow. I've seen these practices work well in production systems handling real user traffic.

I'm sharing this as a starting point—the field is moving quickly so these principles will evolve. I welcome your feedback and contributions to help figure out what "production grade" means for AI systems!

475 points | 78 commentspage 3

mertleee 4/15/2025|

What are your favorite open source "frameworks" for agents?

jlaneve 4/16/2025||

I've been most impressed with Pydantic AI [1], so much so that we ended up building an SDK around it specifically for LLM workflows on Airflow [2].

[1] https://ai.pydantic.dev

[2] https://github.com/astronomer/airflow-ai-sdk

dhorthy 4/16/2025|||

i have seen a ton of good ones, and they all have ups and downs. I think rather than focusing on frameworks though, I'm trying to dig into what goes into them, and what's the tradeoff if you try to build most of it yourself instead

but since you asked, to name a few

- ts: mastra, gensx, vercel ai, many others! - python: crew, langgraph, many others!

shmoogy 4/16/2025||

I'm currently using agno after seeing Google and OpenAI both chose pretty much the same syntax for their agent SDKs. So far so good

deadbabe 4/16/2025||

With all this AI-agent bullshit out there these days, the most useful AI-agent I still use in daily life is the humble floor vacuum/mopping robot.

dhorthy 4/16/2025||

They kept telling me automation would do my chores so we could spend more time on writing and art. I write less and still have to do my own laundry

mikedelfino 4/17/2025|||

The irony is that much of the writing and art have indeed been automated.

flkenosad 4/16/2025|||

HN comments are writing :)

notfed 4/16/2025||

Meh, don't need AI for that. I'll be impressed when it can do my laundry.

throwaway39344 4/17/2025||

[dead]

curtisszmania 4/17/2025||

[dead]

musicale 4/17/2025|

> reliable LLM applications

add that to the list of contradictory phrases (jumbo shrimp, etc.)

pancsta 4/17/2025||

Can you successfully transfer data over unreliable connections? LLM is just a misbehaving DB, once you pin it down the right way and lower your expectations, then "reliable LLM applications" are definitely possible. But if we go yolo with regexp-like-intelligence, then...

musicale 4/18/2025||

> Can you successfully transfer data over unreliable connections?

Validating LLM output is probably not as easy as computing a checksum or CRC.

dhorthy 4/18/2025||

*probably :)

dhorthy 4/17/2025||

it can be done! I believe!