Miguel Otero Pedrido recently shared something that caught my attention: "99% of my agent projects follow this structure. Not because I'm lazy ... because it works." He followed that with a very relatable promise: if you have ever struggled with structuring agent systems "for scale (and sanity)", a cookiecutter-style scaffold can be the missing piece.

That idea resonates because most agent projects do not fail at the model layer first. They fail in the ordinary software places: tangled dependencies, no clear boundaries, ad hoc configs, brittle deployments, and zero tests until something breaks in production. Miguel's post is a reminder that the fastest way to ship reliable agentic systems is to standardize the boring parts.

"No more messy agent repos. Just a clean, extensible foundation that grows with you!"

In this article, I want to expand on what Miguel is pointing to: why a repeatable project structure is a competitive advantage for AI teams, what each folder in a good scaffold is really for, and how to adopt a template without turning it into ceremony.

Why agent projects get messy so fast

Agent systems amplify typical software complexity:

They mix concerns: orchestration, tools, prompts, retrieval, memory, evaluation, and application logic.
They pull in fast-moving dependencies: LLM SDKs, vector DB clients, observability tooling.
They are workflow-heavy: async calls, retries, rate limits, caching, background jobs.
They need feedback loops: evaluation harnesses, regression tests, prompt iteration.

When all of that lives in a single flat repository, the first demo might work, but the second feature becomes painful. The result is what Miguel is pushing back against: "messy agent repos" that do not scale with the team or the product.

A cookiecutter scaffold is not about perfection. It is about getting to a baseline that makes good decisions the default.

The cookiecutter mindset: one command, fewer decisions

Miguel describes a template where "with one command, you get a scaffolded project with placeholders" for the major components. This is the real win: reducing the number of early-stage architectural decisions you must revisit later.

Instead of debating folder names every time, you lock in:

A consistent place for experiments vs production code
A consistent approach to layering and boundaries
A consistent path to ship, test, and deploy

Consistency is underrated. It speeds onboarding, makes code review easier, and allows you to copy proven patterns between projects.

What each part of Miguel's structure is doing

Miguel listed the key pieces. Let me translate them into "why this exists" and "what to put there" so you can apply the concept even if you do not use the exact same template.

CI/CD pipelines: automate the boring safety checks

Miguel calls out "CI/CD pipelines - automated builds & tests." For agent systems, this is especially important because behavior can drift even when the code looks unchanged (dependency updates, prompt edits, provider model changes).

At a minimum, your pipeline should:

Install dependencies in a clean environment
Run linters and type checks (optional but helpful)
Run unit and integration tests
Build the Docker image (if you ship one)
Optionally run a small evaluation suite on canned conversations

If you are building customer-facing agents, consider adding a lightweight "golden set" of scenarios that must not regress. This is not full research-grade evaluation. It is a pragmatic guardrail.

Data: treat resources like product assets

Miguel includes a Data folder for "static files & resources." In agent projects, "data" often means:

System prompts and tool schemas (if you keep them as files)
Few-shot examples
Small domain dictionaries, taxonomies, or routing rules
Seed documents for demos

The key practice: version these resources intentionally, and avoid burying them inside notebooks or random scripts. When prompts are product assets, they deserve structure.

Notebooks: a sandbox, not the application

Miguel highlights notebooks as a "sandbox for prompts & workflows." This separation matters. Notebooks are great for:

Trying alternative prompt strategies
Exploring retrieval settings
Prototyping a new tool call
Debugging an edge case quickly

But notebooks are a terrible place to keep production logic. The rule I like is: notebooks can call into the library, but the library should not depend on notebooks. That keeps experiments reproducible without turning your repo into a maze.

An agent Python library with clean architecture

This is the most important part of Miguel's list: "clean architecture (domain / application / infra layers)." The names vary, but the idea is consistent: separate what you are building from how you are running it.

Domain: the core concepts that should not change

The domain layer captures your business concepts and rules. For an agent system, that might include:

Entities like Conversation, Message, ToolResult
Policies like allowed tools per role
Domain errors and validation

This layer should know nothing about the LLM provider, the vector database, or the web framework.

Application: orchestration and use cases

This is where your agent workflow lives:

The "run agent" use case
Tool selection logic
Memory and context assembly
Guardrails and routing

It can depend on interfaces (ports) like LLMClient, Retriever, ToolRegistry, but not on concrete implementations.

Infrastructure: adapters to the real world

Infra is where you plug in:

OpenAI, Anthropic, or other model clients
Vector DB or search adapters
Logging, tracing, metrics
File systems, queues, caches

When infra changes, your application logic should not need a rewrite. That is what "scale (and sanity)" looks like in practice.

Tests: the confidence multiplier

Miguel explicitly lists "tests - unit, integration, ..." and that is not optional for agents anymore.

A practical testing stack for agents:

Unit tests for prompt assembly, routing decisions, and tool argument validation
Integration tests that stub the LLM but exercise the workflow end-to-end
Contract tests for tool interfaces (input/output shapes)
A small set of live tests against a real provider, run manually or on a schedule

You do not need to test everything with real LLM calls. In fact, you should not. Most regressions are in orchestration and tool wiring, not in the model.

Additional files: Docker, Makefile, configs

Miguel mentions "Docker, Makefile, configs, ..." This is where templates really shine because teams repeatedly reinvent these pieces.

Helpful additions include:

A Makefile that wraps common tasks: test, lint, format, run, build
A Dockerfile that builds a slim runtime image
Environment configuration with clear defaults
Pre-commit hooks to keep the repo clean

These files are not glamorous, but they turn a prototype into a deployable service.

README.md: non-negotiable, for real

Miguel says it plainly: "because a solid README is non-negotiable." A good README for an agent project should answer:

What does this agent do and what does it not do?
How do I run it locally in 5 minutes?
Where are prompts, tools, and configs?
How do I add a tool safely?
How do I run tests and evaluations?

If a new engineer cannot get a successful run without asking you questions, you do not have a README yet.

A simple adoption plan (without overengineering)

If you are convinced by Miguel's approach but worried about process overhead, here is a lightweight way to adopt it:

Start with the scaffold on day one, even for a small project.
Keep notebooks for exploration, but migrate anything you will reuse into the library quickly.
Define boundaries early: domain vs application vs infra, even if each is small at first.
Add CI the moment a second person touches the repo.
Write three tests: one unit test, one integration test, and one "golden" scenario.

The goal is not bureaucracy. The goal is to create a default path that prevents the usual agent project entropy.

Closing thought: templates are leverage

What I like most about Miguel Otero Pedrido's post is the confidence behind it: "Not because I'm lazy ... because it works." That is the voice of someone who has paid the cost of messy repos and decided to stop paying it.

If you build AI agents professionally, a proven scaffold is leverage. It gives you a clean baseline, makes scaling predictable, and frees your brain for the hard problems: product behavior, tool reliability, evaluation, and safety.

This blog post expands on a viral LinkedIn post by Miguel Otero Pedrido, ML / AI Engineer | Founder @ The Neural Maze - Just a guy who builds AI Systems that actually work. View the original LinkedIn post →