How We Decide What Gets an Agent — And What Doesn't
When a watch-trading task gets an AI agent and when a deterministic pipeline wins. Three rules for where agents actually add dealer advantage.
Every dealer who's looked at an "AI-powered" tool has seen the same thing — a chatbot bolted onto a database. Impressive in a demo, unreliable in production. The reason is almost always the same: the wrong tasks were given to an agent.
Building a genuinely useful agentic system in the luxury watch space required us to answer one question before any other: does this task actually need an agent? The answer changed how we built everything.
The Question That Changes Everything
Not "how do we add AI?" but "does this task need an agent at all?" We apply three criteria before giving anything to the Agent. First: is the task genuinely ambiguous — can it be reduced to a decision tree, or does it require judgment? Second: does the value of the task justify the cost? Third: can the outcome actually be verified — can we tell, after the fact, whether the Agent made the right call?
The Decision Tree Test
When you can map out the entire decision tree for a task, don't use an agent — build the decision tree explicitly and optimise every branch. It's cheaper, faster, and you have full control over every outcome. Save agents for the problems where the decision tree would have thousands of branches.
This isn't reluctance to use AI. It's precision about where AI actually adds value. The tasks that genuinely need an agent are the ones where the inputs are ambiguous, the context matters, and the right answer shifts depending on what the data says in real time.
What Gets an Agent vs. What Doesn't
Here's how that plays out in practice across the Mazalgo platform:
Mazalgo Agentic Framework — Task Routing
| Task | Approach | Why |
|---|---|---|
| Scanning communities for WTB activity | Automated pipeline — no AI | Pattern-based, deterministic, high volume. An agent would waste resources on a task with a known answer. |
| Matching a WTB lead to your inventory | Automated scoring — no AI | Mathematical. Brand + reference + price = a score. The same inputs always produce the same output. |
| "Should I buy this watch at $X?" | Agent | Genuinely ambiguous. Requires pricing context, buyer demand signals, your existing inventory, and margin — simultaneously. |
| Drafting outreach to a motivated seller | Agent | Natural language with context-awareness. Every seller and situation is different. Not templatable. |
| Finding sellers for your hunt list | Agent | "Motivated seller" requires interpreting signals — listing age, price behaviour, language. Not a string match. |
| Dispatching match alerts to your inbox | Automated pipeline — no AI | Fully deterministic. If score ≥ threshold, dispatch. Rules don't need judgment. |
| Purging stale data | Scheduled job — no AI | Deterministic. Wasted context budget if given to the Agent. |
The pattern: anything with a deterministic answer — a formula, a threshold, a rule — stays in the automated pipeline. Anything that requires synthesising ambiguous inputs into a judgment call goes to the Agent.
Keep It Simple — Three Components
Once a task passes the agent checklist, the design principle is simplicity. The Agent operates on three things: the environment (your data — inventory, deal pipeline, WTB leads, market intelligence), a set of focused tools (each doing exactly one thing), and a clear set of instructions. That's it. No orchestration layers. No multi-step planners. No pipeline of agents calling agents.
Any complexity added before it's needed kills the ability to iterate. We've reduced the Agent's tool set twice since launch — each time the decisions got better. Fewer, better-scoped tools give the Agent a cleaner picture of the relevant context. More tools add noise.
The Agent is most useful when it knows exactly what it's looking at. Complexity is the enemy of clarity — and clarity is what separates a good recommendation from a hedge.
This principle extends to how the Agent uses tools within a task. For a sourcing decision, it needs four inputs: pricing intelligence, active buyer demand, your current inventory for that reference, and the market's position. That's four tool calls. Not twenty. The budget for each task is finite — and spending it wisely is a design constraint, not an afterthought.
Think From the Agent's Perspective
The most common failure in agent design isn't the model — it's the context. Every decision the Agent makes is a function of what it can see right now. If the context is noisy, the decisions are noisy. If it's missing critical data, the decision degrades. The Agent doesn't have a sixth sense.
Context Is the Product
Before asking "is the Agent smart enough?", ask "is the context sufficient?" The quality of the Agent's recommendation is a direct function of the quality of what it's given to reason over. Design the context window before you design the behavior.
Mazalgo's agentic framework is designed around a single test: given only what the Agent can see at this moment, is that sufficient for a defensible decision? For a sourcing verdict, the answer is yes — pricing data, buyer demand, inventory position, and market temperature. For a broad "what's happening in the market" query, the answer requires choosing which slice of the market matters to this specific user.
This is why the Agent's session opens the same way every time — with a check of what's pending and actionable, followed by a confirmation of what the user wants to focus on. It's not a UX decision. It's a context management decision.
Three Agent Modes, Three Use Cases
In practice, the Agent operates in three focused modes. Each has a clear trigger, a capped number of inputs, and a structured output. No mode bleeds into another.
The Three Agent Modes
Sourcing verdict — A deal is in front of you. Four inputs. One verdict: STEAL, BUY, THIN, or PASS. One rationale sentence. Done in seconds, not minutes.
WTB match outreach — Your alerts have been reviewed. Matches are validated against current pricing. Outreach is drafted per platform. Nothing is sent without your approval — the Agent hands you a queue, not a sent folder.
Hunt list execution — The Agent finds active sellers for your target references, scores them by motivation signals (price reductions, listing age, urgency language), and hands you ranked outreach drafts. You approve; it executes.
Each mode knows when it's relevant. You don't navigate menus or trigger workflows manually. Say "should I buy this?" and the sourcing verdict runs. Open the app and the Agent has already reviewed your alerts. Ask it to find your watches and it comes back with ranked sellers and ready-to-send messages.
That's the practical difference between a tool and an agent. A tool waits to be used. An agent knows what's relevant before you ask.
The Feedback Loop
The Agent records what happened after every recommendation. Not to retrain itself in real time — but to create a measurable record of its own accuracy. Over time: which verdict tiers actually convert to closed deals? Which outreach approaches get replies? Which hunt references keep getting passed on?
This closes the loop from recommendation to outcome. The Agent isn't just making calls — it's building a track record. That track record, in turn, informs how we tune the context it receives and the thresholds it applies.
An agent that doesn't measure its outcomes is guessing. An agent that does is learning what works for you.
Key Takeaways
- ✓Agents are the right tool for genuinely ambiguous, high-value tasks — not for everything. Most tasks in a watch dealer's workflow are better served by deterministic pipelines.
- ✓Simplicity is a design principle, not a constraint. Fewer tools, narrower scope, and cleaner context produce better decisions than a complex orchestration layer.
- ✓The quality of the Agent's context determines the quality of its decisions. Design what the Agent sees before you design what it does.
- ✓A closed feedback loop turns recommendations into a measurable track record — which is the only way to know if the Agent is actually helping.