I've been in a lot of conversations lately where someone drops the word "sub agents" and half the room nods confidently, half the room stares blankly, and everyone is too afraid to ask which half they're actually in.
I'm not judging. A few months ago, I was firmly in the second half.
Here's the thing: this isn't just vocabulary. Choosing the wrong agent architecture for the wrong task is the difference between paying one dollar for a fast, useful answer and paying a thousand dollars for something slow, expensive, and wrong. I've seen it happen. So let me break down what these three things actually are, because I think there's a clear and honest way to explain it that doesn't require a PhD or a tolerance for hype.
What is a tool call?
A tool call is the most basic one, in a good way.
Think of a tool call as the foundational muscle movement of an AI agent. A squat, a curl, a press. When an agent needs to go get something — retrieve a line of code, query a database, look up a ticket — it makes a tool call. That's it.
Tool calls are there whether you're running the simplest agent or a massively parallel swarm. They exist at every layer. They are not the interesting part of the architecture conversation, but they are the foundation of all of it. Every sub agent makes tool calls. Every teammate makes tool calls. You cannot escape the tool call. Embrace the tool call.
What is a sub agent?
A sub agent is a specialist that’s called upon when you have a task to complete - that version of you that remembers how to negotiate at a car dealership, even though it’s been a few years. It’s a version of you that can be called up as needed without needing to fully re-train yourself on the task.
The car dealership model is something I keep coming back to when I describe this for family and friends. When you set out to buy a new car, you have something in mind. You might need something practical: you just started a family, you live in a city, but you love a long mountain camping trip a few times a year.
If you go to a sub agent for this answer, you're essentially visiting every car dealership in a 30-mile radius yourself and talking to a salesperson at each one. All of that information comes back to you, in full, and now it's in your head…or rather, in your context window. You're the one sorting through it and figure out that a minivan is probably not right and a truck is probably overkill, that buying an American car might fill you with pride but buying a Japanese car might give you better mileage for your money. Each version of you that visited the dealership and got the pitch is a sub-agent specialist sorting through the information it received.
Sub agents are honestly great for a lot of things. If your question is narrow and the answer is retrievable, sub agents work beautifully. Ask a sub agent to find out what the opening hours of the car dealerships are, where to find the authentication logic in your codebase, or how to classify an incoming support ticket — you'll get a great answer quickly and the context window cost is minimal. No harm done.
The problem shows up when the questions get complex. Sub agents are built to surface information, not filter it. They are, as someone on my team put it, "vomiting information" back into the shared context window. At a certain point, you end up in the paradox of choice: so many car options and decision variables that you can't actually decide which one you love.
What is an AI teammate?
This is where things get interesting.
A teammate is still a specialist — but instead of dumping their full research into the shared context window, they do their own thinking, coordinate with other specialists, apply judgment, and only then bring you back the things that actually matter.
Back to the car analogy. Teammates are your three friends, each with deep expertise in a different vehicle type. You explain your requirements once. Then you send them out. They visit the dealerships, compare notes, call each other when they find something interesting, and come back with two or three options each — already filtered, already cross-checked, already ranked by what fits your actual life. You never had to enter a single dealership. Your context window, metaphorically speaking, stayed clean.
The difference isn't about how smart they are. It's about the interaction mode. With a sub agent, you hear everything they discovered. With a teammate, you only hear what they decided was worth telling you.
That distinction matters enormously when the work is complex. Root cause analysis, for example. A good RCA requires simultaneously investigating the front end, the back end, recent deployments, and customer-facing symptoms. You can't just rabbit-hole down one thread and hope. You need parallel investigation, and you need those parallel threads to converge on an answer that's actually grounded in evidence — not just the first hypothesis that looked promising.
The other key difference: teammates are persistent. A sub agent spins up, returns an answer, and dies. A teammate sticks around, builds context over time, can receive follow-up questions, and can continue a thread it started 20 minutes ago. This is what makes long-horizon work possible — the kind of work where you kick something off in the morning, go to your stand-up, have lunch, and come back to an answer you can actually trust.
The honest tradeoff
Teammates are not always better. I want to be clear about that, because I think there's a tendency to hear "more sophisticated architecture" and assume "always use it."
Teammates cost more tokens, take longer, and require coordination overhead. For simple questions with clear, retrievable answers, that overhead is pure waste. Sending a swarm of specialized teammates to suggest a minor performance improvement in a utility function is like calling in Arnold Schwarzenegger to re-arrange patio furniture.
The way I've started thinking about it: sub agents are for questions with a surface-level answer. Teammates are for questions where being wrong is expensive.
The good news is that over time, systems can get better at knowing which is which. At PlayerZero, we're actually buildintig toward a world where the agent itself can sense when a question needs an Arnold — and ask you before it calls him.
What this looks like in production engineering
All that said, I think the teammate architecture is genuinely well-suited to the problems production engineering teams face every day.
When a support ticket comes in or an incident fires, the right answer almost never lives in one place. You need to look at logs, trace through the codebase, check what changed in recent deployments, surface related historical issues. You need parallel investigation threads, and you need those threads to actually talk to each other and converge — not produce five separate reports that someone then has to manually synthesize at 2am.
PlayerZero's teammate architecture, which we call hive mode, is built for this. When hive mode is on, a top-level orchestrator spins up a set of research teammates simultaneously. Each has its own isolated context. They communicate asynchronously — think of it like an internal Slack where agents can DM each other or broadcast to the group — and they coordinate before they surface findings. Then a second layer of cross-checker teammates comes in: their job is specifically to pressure-test the researchers' conclusions, verify claims against evidence, and flag anything that wasn't grounded.
This is how you go from an 80% accurate answer to something closer to 99%. Not magic. Just architecture.
And because each teammate's context is completely separate from the others, the orchestrator never accumulates a bloated shared context window. It delegates, gets back high-quality synthesized outputs, and makes decisions from a clean information set.
I think about this like the difference between a hive that's swarming (everyone flying together, sharing one orientation, unable to make honey) and a hive that's settled (specialized workers, dedicated roles, each bee building their piece of the comb). The swarming version is motion. The settled version is output.
The bigger picture
Something our CRO said in a recent team meeting has stuck with me: writing software is becoming a commodity. Operating software at scale is not.
I think that's the frame for why all of this matters. There's an enormous amount of energy right now going into tools that help developers write code faster. That's real and it's valuable. But the problems that are getting harder are the ones that live in production. The long tail of support tickets, multi-layer bugs, and incidents that require coordination across five teams.
Those problems need agents that can hold complex context, coordinate across specialties, work asynchronously over hours instead of minutes, and come back with answers that are trustworthy.
That's the case for teammates. Not just as a cool architecture pattern, but as the right tool for the hardest problems in production engineering.
Quick reference: the three types
Tool calls — Atomic actions. An agent reaching into a system to get or write data. Present at every layer. Foundational.
Sub agents — Specialists that execute a task and return their full findings to the shared context window. Fast, simple, great for narrow questions. Context window gets expensive as questions get complex.
Teammates (hive mode) — Isolated specialists with their own context who coordinate asynchronously, apply judgment, and return synthesized outputs. Slower and more expensive. Built for complex work where accuracy matters and wrong answers are costly.
FAQ
What's the difference between a tool call and a sub agent? A tool call is a single action an agent takes to retrieve or write data. A sub agent is a separate agent invoked to handle a specialized task — it makes its own tool calls and returns results to the main agent's context window.
Are skills in Claude Code the same as sub agents? Not exactly. Skills are capabilities you define in advance that an agent can invoke. Sub agents are dynamically spun up within an agent loop. The important distinction is that in both cases, all returned information merges into the main context window. With teammates, each specialist maintains completely separate context.
When should I use hive mode vs. standard agent mode? Standard mode for anything with a clear, retrievable answer. Hive mode for deep technical investigation, root cause analysis, complex multi-hypothesis problems, or any task where you want to hand it off and come back to a trustworthy result. When in doubt: how expensive is being wrong? That's your answer.
How is PlayerZero's teammate architecture different from Claude Code's swarm mode? In Claude Code, users define what each swarm agent does, and agents share a context window. In PlayerZero, the orchestrator writes the teammate instructions dynamically for each new task, every teammate has completely isolated context, and the workflow includes built-in cross-checking stages that verify findings before they surface. It's built for long-horizon autonomous production work, not rapid iteration on local code.
Can a single agent figure out when to switch to hive mode on its own? We're working towards it. The honest answer right now is: sometimes. Agents can have awareness of both modes and can offer to switch if they sense the question warrants it. But like humans, they don't always know upfront how deep a problem actually goes. That's part of why giving users an explicit toggle matters — it lets you bring your own judgment to the question.
Julia Blase is Director of Product at PlayerZero, where she thinks about production engineering, AI architecture, and whether we can build AI that earns real trust. PlayerZero is hiring.