Back to Blog
Trending Post

Manthan Patel Maps the Real AI Agent Architecture

A practical expansion of Manthan Patel's agent stack, from perception to learning, plus where theory still breaks in production.

LinkedIn contentviral postscontent strategyAI agentsagent architectureLLM orchestrationtool usemachine learningsocial media marketing

Manthan Patel recently shared something that caught my attention: "AI Agent Architecture" followed by a crisp walk-through of the stack: "Step 1: Perception" then "Reasoning," "Planning," "Execution," "Learning," and "Interaction." He ends with the point that really matters in practice: what makes an AI agent different from simple automation is the feedback loops between components.

That framing is worth expanding, because it explains why teams get excited about agents and why so many early deployments still feel brittle. Below is my attempt to build on Manthan's architecture, add practical context, and highlight where theory tends to outpace what actually ships.

The core loop: why agents are not just workflows

A workflow is usually a one-way pipeline: input comes in, steps execute, output goes out. An agent, as Manthan described, is a system where outputs constantly reshape the next inputs.

Key insight: When execution results feed into learning, and learning improves reasoning, the system gets better with experience instead of repeating the same mistakes.

In production terms, that means an agent is not one model call. It is a set of components that maintain state, interpret the world, choose actions, observe results, and update behavior. If any part of that loop is weak, the whole agent can look like a fancy script.

Step 1: Perception (getting reliable signals)

Manthan starts with perception: the agent "processes inputs from its environment through multiple channels" like NLP for language, computer vision for images, plus audio, sensors, and state tracking.

In real systems, perception is where you decide what the agent even knows. Common production patterns include:

  • Text ingestion: emails, tickets, chat logs, docs, web pages
  • Tool telemetry: API responses, logs, errors, rate limits
  • User context: preferences, permissions, tenant settings
  • State: what the agent has already tried and what worked

Where perception breaks down is not usually the model, it is the input quality and the contract around it. If your agent reads a PDF with poor OCR, or scrapes a web page with unstable HTML, the rest of the stack inherits that noise.

Practical check: define an "observation schema". Even if the raw input is messy, transform it into structured fields (source, timestamp, confidence, entity IDs) so the reasoning layer can treat it predictably.

Step 2: Reasoning (turning signals into understanding)

Manthan describes reasoning as the combination of "logical inference systems paired with knowledge bases" plus symbolic, neural, and Bayesian approaches to handle uncertainty.

In day-to-day product work, reasoning is often a mix of:

  • Retrieval: find relevant facts from a knowledge base (RAG)
  • Constraint checking: policies, permissions, brand rules
  • Uncertainty handling: when to ask a clarifying question
  • Self-critique: sanity checks on the proposed answer or plan

My take: reasoning is less about sounding smart and more about staying grounded, citing sources, and knowing when you do not know.

A common gap between theory and practice here is that "reasoning" becomes a catch-all word. Teams expect a single LLM prompt to do inference, compliance, and domain expertise at once. It can work in demos, but in production you need guardrails: explicit policies, tool-based validation, and a trace of why the agent believed something.

Step 3: Planning (from goals to a sequence of actions)

Manthan highlights planning as "goal setting, strategy formulation, and path optimization" with hierarchical plans, simulations, and continuous optimization.

Planning is where agents stop being chatbots and start acting like operators. Good planning usually includes:

  • Breaking a goal into tasks (task decomposition)
  • Choosing tools per task (search, database query, CRM update)
  • Ordering tasks safely (read before write, verify before send)
  • Deciding checkpoints (ask for approval at high-risk steps)

In practice, the best planners are boring. They produce short, testable steps, not a grand strategy essay. Also, most teams do not need complex path optimization at first. They need reliability: the same plan should work across slightly different inputs.

Practical check: store the plan as structured data (JSON-like steps with prerequisites and expected outputs). That makes monitoring, retries, and human review far easier.

Step 4: Execution (tools, actions, and monitoring)

Manthan says execution "molds plans into actions" via APIs, code execution, web access, and specialized tools, sometimes in parallel or distributed setups.

Execution is where reality hits:

  • APIs fail or return partial data
  • Permissions vary by user and tenant
  • Rate limits and timeouts are normal
  • Side effects can be irreversible (sending an email, deleting a record)

If you only remember one thing: execution needs safety rails. For many business agents, the difference between a helpful assistant and a liability is whether writes are gated.

Three practical patterns that help:

  1. Read-only mode by default, write mode only with explicit user confirmation.
  2. Idempotent actions and retries: do not create duplicates when rerunning.
  3. Observability: log every tool call, inputs, outputs, and durations.

Execution is not just "calling tools." It is transaction management for AI-driven decisions.

Step 5: Learning (memory, feedback, and improvement)

Manthan describes learning as the adaptive layer with short-term memory, long-term storage, and feedback loops using supervised, unsupervised, and reinforcement learning.

This is also where the biggest expectations gap shows up. Many teams hear "learning" and assume the agent will automatically get smarter just by running. But most production systems need deliberate feedback design.

A practical way to separate concerns:

  • Short-term memory: what happened in this session (conversation state, current task variables)
  • Long-term memory: stable knowledge and preferences (user settings, project context)
  • Learning loop: a process that turns outcomes into changes (updated prompts, improved retrieval, new rules, or model fine-tuning)

Learning can be lightweight and still valuable. For example:

  • Capture user edits and approval decisions
  • Track tool errors and add automated fallbacks
  • Store "golden traces" of successful runs and replay them in tests

If you do fine-tune or use reinforcement learning, do it after you have instrumentation. Otherwise you will not know whether the system improved or just changed.

Step 6: Interaction (the interface is part of the architecture)

Manthan ends with interaction: the layer that handles "all external exchanges" across text, voice, and visual channels, tuned to context.

Interaction is not just UX polish. It is where you shape behavior:

  • Asking clarifying questions instead of guessing
  • Showing sources and confidence
  • Presenting a plan before executing
  • Explaining what changed after an action

In other words, interaction is a control surface for risk. A well-designed UI can turn a powerful but uncertain agent into a dependable collaborator.

The feedback loops that make it an agent

Manthan's key distinction is the feedback loop: execution feeds learning, learning improves reasoning, and the agent becomes adaptive.

To make that real, you need two things:

  1. Signals: Did the action succeed? Did the user accept the result? Did the API return an error? How long did it take?
  2. Levers: What can you change? Prompt templates, retrieval index, routing rules, tool selection, or the underlying model.

Without signals and levers, you do not have learning, you only have logs.

Where is the biggest gap between theory and practice?

Manthan asked: "Which component has the biggest gap between theory and practice?" If I had to pick one, it is learning, specifically the jump from "memory" to "measurable improvement." Teams often ship an agent with a vector store and call it learning. Real learning requires evaluation, feedback capture, and an update mechanism that does not introduce regressions.

A close second is planning: not because it is impossible, but because plans that look good in text often fail when tools behave unpredictably. The cure is tight execution monitoring and short, verifiable steps.

A simple way to apply this architecture

If you are building an agent today, try this checklist:

  • Perception: define an observation schema and normalize inputs.
  • Reasoning: ground answers in retrieved sources and enforce policies.
  • Planning: produce structured, reviewable steps with checkpoints.
  • Execution: instrument tool calls, support retries, gate write actions.
  • Learning: capture feedback, run evaluations, update safely.
  • Interaction: ask when uncertain, show the plan, show evidence.

That is the practical version of Manthan Patel's diagram: not just components in a stack, but a loop that keeps the system honest and improves it over time.

This blog post expands on a viral LinkedIn post by Manthan Patel. View the original LinkedIn post →