Vision & Category·February 24, 2026·10 min read

The 10-Layer Agent Stack You'll Build Anyway (So We Built It)

The ten layers of agent infrastructure every serious agent app ends up building — and how Matrix composes them into one runtime.

By Matrix Team

You start with model.generate() in a while loop. Three months later you have a JWT filter, a tenant column you keep forgetting to add to queries, a vector store you bolted on for memory, a WebSocket bridge for voice that breaks every time you redeploy, and a tangle of glue that decides which tools a given agent can call this turn.

That tangle is agent infrastructure. Nobody sets out to build it. Everybody ends up with it.

We mapped the thing every serious agent app converges on — ten layers — and built them once, generically, under one runtime. Here's the stack, layer by layer: the problem each one solves, and why you'll build it whether or not you planned to.

┌─────────────────────────────────────────────────────────────────────┐
│  1  Auth / tenancy      JWT + request-scoped TenantContext            │
│  2  Domain graph        generic EntityType / EntityNode / PropertyDef  │
│  3  Agent runtime       BYOK registry + AgentToolSurface + SSE         │
│  4  Voice               browser-direct Gemini Live, ephemeral tokens   │
│  5  Async tasks         saga envelope on a pluggable TaskBus           │
│  6  MCP server          Streamable HTTP at /mcp/**, JWT-gated          │
│  7  Memory              embed-on-write · HNSW recall · post-turn pass  │
│  8  Contact identity    channel-agnostic; one memory pool per contact  │
│  9  Users admin UI      operators + contacts                          │
│ 10  Skills · Knowledge · Toolbox    the v2 composition layer          │
└─────────────────────────────────────────────────────────────────────┘

Two ideas thread through all ten, so it's worth naming them up front. First, everything is a node — EntityType / EntityNode in Neo4j, no hand-rolled domain classes (layer 2 below, and the generic entity model post goes deep on why). Second, one composition path — AgentToolSurface — assembles every agent turn. Keep those two in mind and the rest of the stack falls into place.

Layer 1 — Auth / tenancy

The problem: the moment a second organization signs up, every query you wrote is wrong. It returns somebody else's data.

You can't retrofit isolation. It has to be the floor every query stands on. In Matrix, Spring Security + a JWT (HS256) populate a request-scoped TenantContext — orgId, userId, roles — on every request, and every read and write filters by orgId. Tenancy isn't a column you remember to add; it's where the query starts.

Why you'll build it anyway: the alternative is a per-customer deployment — its own infrastructure project — or a security incident, which is worse.

Layer 2 — Domain graph

The problem: every new concept — Agent, Skill, Lead, Campaign, Memory — wants its own table, its own migration, its own admin screen. Your schema grows faster than your product.

Matrix models all of it as EntityType / EntityNode / EntityRelationship / PropertyDefinition rows in Neo4j. EntityType.ownerOrgId distinguishes platform-globals (null) from per-org custom types; EntityNode.orgId is the tenant marker on every instance. A new field is a PropertyDefinition edit, not a new class. A tenant adding a custom field to Lead is a same-named org-owned type overlay, not a fork.

That sounds abstract until you realize it's what makes "ship a persona by filling out a form" possible — an agent is an entity, its skills are entity refs, its memory rows are entities. One CRUD path, one admin kit, infinite shapes.

Why you'll build it anyway: the second your customers want fields you didn't anticipate, you either fork per tenant or you build a generic model. There's no third option that scales.

Layer 3 — Agent runtime

The problem: "which tools can this agent call this turn?" is deceptively hard. The answer is a union of direct tools, MCP methods, skill-contributed tools, built-ins, knowledge search, memory tools, and an ambient clock — and it changes per agent, per turn.

This is the heart of the platform, and it's one method: AgentToolSurface.composeForCaller. For each turn it unions:

agent.tools — direct HTTP / display tool refs
agent.mcpServers — external MCP servers
skill.tools from every attached Skill
BuiltinToolRegistry lookups for INTERNAL-transport tools (web_search, bash, …)
the auto-attached search_knowledge tool, when agent.knowledge is non-empty
the memory built-ins, gated by which keys the attached skills requested
the ambient get_current_time

The model layer underneath is BYOK: LlmProviderRegistry builds a Spring AI ChatModel per (org, provider), with API keys encrypted at rest via SecretEncryptor. ChatService drives a turn and streams the result over SSE.

The payoff of one composition path: the same composed surface and prompt drive text chat, real-time voice, and autonomous background tasks. An agent behaves identically no matter how a contact reaches it — a parity invariant, not a hope.

Why you'll build it anyway: the first time you add a tool and it shows up in chat but not voice, you'll wish you'd centralized composition. So centralize it on day one.

Layer 4 — Voice

The problem: real-time, full-duplex voice is a different animal from request/response. Audio frames, barge-in, sub-second latency, and a wire protocol that punishes every wrong key.

Matrix runs voice on the consumer Gemini Live API two ways. The browser-direct path holds the WebSocket straight from the browser to Gemini; the backend only mints ephemeral tokens — zero server in the audio path. The telephony bridge runs one CallSession per call, bridging Exotel to Gemini Live and adopting the contact, direction, campaign, and per-call objective the moment it connects.

The wire protocol is genuinely fragile — snake_case realtime_input, the right BidiGenerateContent variant, a tuned barge-in drop window. We don't pretend it was easy; the whole debugging journey lives in docs/LEARNINGS.md, and the rule is: never touch the voice path without re-reading it.

Why you'll build it anyway: if your product touches a phone or a microphone, there's no shortcut around this layer — only "we already did it" or "you're about to."

Layer 5 — Async tasks

The problem: outbound calls, post-call extraction, autonomous runs — none of these belong on the request thread. You need a queue, retries, and a saga envelope.

Matrix ships a saga envelope on a TaskBus. The default backend is in-process — an in-JVM BlockingQueue, zero new infra. Set MATRIX_TASKS_BACKEND=kafka and start the Redpanda compose service, and the same envelope rides Kafka instead, with no behavior change. The bus is a seam, not a commitment.

Why you'll build it anyway: the first long-running operation you do inline will time out an HTTP request. The pluggable part matters because you don't want to pick Kafka before you have the load that justifies it.

Layer 6 — MCP server

The problem: other tools want to talk to your agents and data. If you bolt on a second API with a second auth model, you now maintain two doors.

Matrix exposes /mcp/** (Streamable HTTP) with list_agents, query_entities, read_memory, write_memory, and dispatch_task — gated by the same JWT as the REST surface. One auth model, two protocols. And because it's a client too, agents can call external MCP servers, attached directly or bundled into a skill.

Why you'll build it anyway: MCP is how the ecosystem composes now. Being only a client leaves your platform unreachable; being a server with a bespoke auth model leaves you with two security surfaces to keep in sync.

Layer 7 — Memory

The problem: an agent that forgets you between turns — let alone between calls — isn't an agent, it's autocomplete.

MemoryService writes four kinds — WORKING | EPISODIC | SEMANTIC | PROCEDURAL. Every write is embedded on the spot (gemini-embedding-001, pinned to 768d) and stored as a List<Float> on the underlying :Entity node. Recall queries Neo4j's native HNSW vector index — no separate vector database, because the graph store already does it. A substring fallback kicks in when embeddings are disabled or the index is empty.

Two details that matter more than they look. MemoryContextRenderer produces a deliberately heavy-handed "who you're talking to" Markdown block injected into both text-chat and Gemini Live prompts — it has to win against multi-thousand-character persona prompts that say "ask for the date of birth." And MemoryExtractorService runs a fire-and-forget post-turn pass that distills each session into a digest plus durable facts.

Why you'll build it anyway: the demo works without memory. The product doesn't. Re-asking a returning user their birthday is the fastest way to look like a toy.

Layer 8 — Contact identity

The problem: the same person calls on Monday and chats on Tuesday. If those are two unrelated rows, your memory pool is fragmented and the agent has amnesia per channel.

Every interaction resolves to a User(userType=CONTACT) — channel-agnostic. The voice bridge resolves by E.164-normalized phone (with Indian-format variants) via CallerResolver; for text chat the JWT user plays the same role. The resolved id is stamped on the interaction so all channels share one memory pool per (agent, contact).

This is where the unified Session entity earns its keep. A chat and a phone call aren't two concepts joined by glue — they're one row, channel-discriminated (TEXT_CHAT / VOICE_REALTIME / future WHATSAPP), keyed by userId. The agent remembers you whether you call or type.

Why you'll build it anyway: the first cross-channel "but I told you this on the phone" complaint will send you here. Identity unification is cheaper before you have two channels than after.

Layer 9 — Users admin UI

The problem: operators need to see everyone the org knows — staff and end-users — with stats, search, and a way to inspect what the agent remembers.

/admin/users lists operators and contacts with stat cards, filter chips, search, a table, and a right-side drawer for inline edit plus a memories view. It's the human-friendly face of the generic entity table from layer 2 — same data, a screen built for people instead of rows.

Why you'll build it anyway: the moment something goes wrong in production, someone needs to look at a contact and see what the agent thinks it knows. Without this layer, that's a Cypher query at 2am.

Layer 10 — Skills, Knowledge, and the built-in toolbox

The problem: extensibility. You want to add a behaviour, a corpus, or a tool to an agent without writing per-feature wiring each time.

Three first-class primitives, all composing through AgentToolSurface with zero per-feature glue on the agent end:

Skill — a behaviour bundle: a systemPromptBlock, tool refs, MCP refs, required contact fields, and bundled files. Import any Anthropic Agent Skill from a GitHub URL; bundled files materialize into the agent's sandbox before every turn.
Knowledge — a per-org RAG corpus. Upload .md / .txt / .html / .pdf; it's chunked (~2000 chars, 200 overlap), embedded, and stored. Attach it and the agent automatically gets a search_knowledge tool. Flip graphragEnabled and ingestion also extracts an entity/relation graph per chunk.
Built-in toolbox — web_search (DuckDuckGo, no key), fetch_url, bash, file_read/write/list, grep — all sandbox-scoped per (org, agent). One seeded skill attaches the whole set.

This is the layer that turns "build an agent" into "compose one." It exists because layers 1–9 made it possible: generic entities, one tool surface, shared memory, multi-tenancy underneath.

Why you'll build it anyway: the second agent you build will want 80% of the first one's behaviour. Without composition, you copy-paste; with it, you attach.

The takeaway

These ten layers aren't a Matrix invention — they're the convergent shape of every agent platform, discovered the hard way by everyone who's shipped one. The choice isn't whether you build them. It's whether you build them ten times, ad hoc, fused to your domain — or once, generically, with AgentToolSurface as the spine and EntityType/EntityNode as the substrate.

A few things ship dark and opt-in by design — call recording sits behind a flag, self-improvement is double-gated and off by default, RBAC strict-mode is per-org opt-in — because production infrastructure ships safe defaults, not surprises.

If you've been quietly building this stack inside your own app, that's the tell that you needed a platform. We made the same diagnosis — that's why we built a platform, not a framework.

Skip the ten-layer detour. Create a workspace, ship your first agent from a form, and read docs/ARCHITECTURE.md for the full tour.

#agent infrastructure#architecture#platform layers

Build your first agent on Matrix

Spin up a workspace, wire up tools and knowledge, give your agent a voice, and talk to it in real time — no agent code required.

Create a workspace Read more articles