What tasks should NOT be handled by an AI agent?

Anything deterministic: scanning forums for patterns, calculating match scores, running scheduled data purges, sending routine alerts once a threshold is crossed. If the task can be expressed as a formula or a rule, it should be a pipeline — not an agent. Giving deterministic work to agents wastes token budget, introduces variance into a process that should be repeatable, and makes debugging harder. Agents are for ambiguous judgment calls, not for work with a known algorithm.

Why limit the agent to 8 tool calls per session?

Quality compounds inversely with tool-call count. Past ~8 calls, the agent starts to retrieve data that no longer informs the decision — adding noise, not signal. The 8-call cap forces the agent to pick the four or five most relevant inputs rather than exhaustively querying everything available. If a task genuinely needs more than 8 calls, the scope is wrong — break it into separate agent invocations with their own context budgets.

Does the agent learn from my approvals and dismissals?

Yes, but not in real time through model weights — through an outcome feedback loop. Every time you approve, dismiss, or confirm an action, the agent logs the verdict, the inputs, and the outcome to an `agent_outcomes` table. Over time this builds an accuracy record by verdict tier (STEAL → closed deal rate, BUY → closed rate, PASS → correct-to-skip rate). That record informs threshold tuning and context improvements — not the base model.

Can the agent send messages on my behalf without approval?

Never, by design. Every outbound communication (WTB outreach, seller message, deal reply) goes into an approval queue. You review, edit if needed, and send. The agent drafts; you dispatch. This is a hard constraint in every skill, not a user-toggleable preference. The agent cannot bypass it because no tool in the MCP layer has "send" capability — only "draft to queue."

Why "four tool calls" for a sourcing verdict specifically?

Because that is what a defensible buy/pass decision actually requires: (1) current pricing context for the reference, (2) active buyer demand, (3) your existing inventory for that reference, (4) market temperature/trend. Those four inputs are sufficient for a confident verdict. A fifth call typically adds information that does not change the decision. If the answer is not obvious after four inputs, the deal is THIN by definition — and that uncertainty is itself the output.

technology

agentic AI

AI agents

dealer tools

watch trading technology

automation

AI framework

How We Decide What Gets an Agent — And What Doesn't

When a watch-trading task gets an AI agent and when a deterministic pipeline wins. Three rules for where agents actually add dealer advantage.

Mazalgo Intelligence

4/17/2026

7 min read

Every dealer who's looked at an "AI-powered" tool has seen the same thing — a chatbot bolted onto a database. Impressive in a demo, unreliable in production. The reason is almost always the same: the wrong tasks were given to an agent.

Building a genuinely useful agentic system in the luxury watch space required us to answer one question before any other: does this task actually need an agent? The answer changed how we built everything.

The Question That Changes Everything

Not "how do we add AI?" but "does this task need an agent at all?" We apply three criteria before giving anything to the Agent. First: is the task genuinely ambiguous — can it be reduced to a decision tree, or does it require judgment? Second: does the value of the task justify the cost? Third: can the outcome actually be verified — can we tell, after the fact, whether the Agent made the right call?

The Decision Tree Test

When you can map out the entire decision tree for a task, don't use an agent — build the decision tree explicitly and optimise every branch. It's cheaper, faster, and you have full control over every outcome. Save agents for the problems where the decision tree would have thousands of branches.

This isn't reluctance to use AI. It's precision about where AI actually adds value. The tasks that genuinely need an agent are the ones where the inputs are ambiguous, the context matters, and the right answer shifts depending on what the data says in real time.

What Gets an Agent vs. What Doesn't

Here's how that plays out in practice across the Mazalgo platform:

Mazalgo Agentic Framework — Task Routing

Task	Approach	Why
Scanning communities for WTB activity	Automated pipeline — no AI	Pattern-based, deterministic, high volume. An agent would waste resources on a task with a known answer.
Matching a WTB lead to your inventory	Automated scoring — no AI	Mathematical. Brand + reference + price = a score. The same inputs always produce the same output.
"Should I buy this watch at $X?"	Agent	Genuinely ambiguous. Requires pricing context, buyer demand signals, your existing inventory, and margin — simultaneously.
Drafting outreach to a motivated seller	Agent	Natural language with context-awareness. Every seller and situation is different. Not templatable.
Finding sellers for your hunt list	Agent	"Motivated seller" requires interpreting signals — listing age, price behaviour, language. Not a string match.
Dispatching match alerts to your inbox	Automated pipeline — no AI	Fully deterministic. If score ≥ threshold, dispatch. Rules don't need judgment.
Purging stale data	Scheduled job — no AI	Deterministic. Wasted context budget if given to the Agent.

The pattern: anything with a deterministic answer — a formula, a threshold, a rule — stays in the automated pipeline. Anything that requires synthesising ambiguous inputs into a judgment call goes to the Agent.

Keep It Simple — Three Components

Once a task passes the agent checklist, the design principle is simplicity. The Agent operates on three things: the environment (your data — inventory, deal pipeline, WTB leads, market intelligence), a set of focused tools (each doing exactly one thing), and a clear set of instructions. That's it. No orchestration layers. No multi-step planners. No pipeline of agents calling agents.

Any complexity added before it's needed kills the ability to iterate. We've reduced the Agent's tool set twice since launch — each time the decisions got better. Fewer, better-scoped tools give the Agent a cleaner picture of the relevant context. More tools add noise.

The Agent is most useful when it knows exactly what it's looking at. Complexity is the enemy of clarity — and clarity is what separates a good recommendation from a hedge.

This principle extends to how the Agent uses tools within a task. For a sourcing decision, it needs four inputs: pricing intelligence, active buyer demand, your current inventory for that reference, and the market's position. That's four tool calls. Not twenty. The budget for each task is finite — and spending it wisely is a design constraint, not an afterthought.

Think From the Agent's Perspective

The most common failure in agent design isn't the model — it's the context. Every decision the Agent makes is a function of what it can see right now. If the context is noisy, the decisions are noisy. If it's missing critical data, the decision degrades. The Agent doesn't have a sixth sense.

Context Is the Product

Before asking "is the Agent smart enough?", ask "is the context sufficient?" The quality of the Agent's recommendation is a direct function of the quality of what it's given to reason over. Design the context window before you design the behavior.

Mazalgo's agentic framework is designed around a single test: given only what the Agent can see at this moment, is that sufficient for a defensible decision? For a sourcing verdict, the answer is yes — pricing data, buyer demand, inventory position, and market temperature. For a broad "what's happening in the market" query, the answer requires choosing which slice of the market matters to this specific user.

This is why the Agent's session opens the same way every time — with a check of what's pending and actionable, followed by a confirmation of what the user wants to focus on. It's not a UX decision. It's a context management decision.

Three Agent Modes, Three Use Cases

In practice, the Agent operates in three focused modes. Each has a clear trigger, a capped number of inputs, and a structured output. No mode bleeds into another.

The Three Agent Modes

Sourcing verdict — A deal is in front of you. Four inputs. One verdict: STEAL, BUY, THIN, or PASS. One rationale sentence. Done in seconds, not minutes.

WTB match outreach — Your alerts have been reviewed. Matches are validated against current pricing. Outreach is drafted per platform. Nothing is sent without your approval — the Agent hands you a queue, not a sent folder.

Hunt list execution — The Agent finds active sellers for your target references, scores them by motivation signals (price reductions, listing age, urgency language), and hands you ranked outreach drafts. You approve; it executes.

Each mode knows when it's relevant. You don't navigate menus or trigger workflows manually. Say "should I buy this?" and the sourcing verdict runs. Open the app and the Agent has already reviewed your alerts. Ask it to find your watches and it comes back with ranked sellers and ready-to-send messages.

That's the practical difference between a tool and an agent. A tool waits to be used. An agent knows what's relevant before you ask.

The Feedback Loop

The Agent records what happened after every recommendation. Not to retrain itself in real time — but to create a measurable record of its own accuracy. Over time: which verdict tiers actually convert to closed deals? Which outreach approaches get replies? Which hunt references keep getting passed on?

This closes the loop from recommendation to outcome. The Agent isn't just making calls — it's building a track record. That track record, in turn, informs how we tune the context it receives and the thresholds it applies.

An agent that doesn't measure its outcomes is guessing. An agent that does is learning what works for you.

Key Takeaways

✓Agents are the right tool for genuinely ambiguous, high-value tasks — not for everything. Most tasks in a watch dealer's workflow are better served by deterministic pipelines.
✓Simplicity is a design principle, not a constraint. Fewer tools, narrower scope, and cleaner context produce better decisions than a complex orchestration layer.
✓The quality of the Agent's context determines the quality of its decisions. Design what the Agent sees before you design what it does.
✓A closed feedback loop turns recommendations into a measurable track record — which is the only way to know if the Agent is actually helping.

Your AI Agent is live in Mazalgo. Try the sourcing verdict the next time a deal lands in front of you — STEAL, BUY, THIN, or PASS in seconds.

Frequently Asked Questions

technology

How Mazalgo Uses Agentic Workflows to Build Real Data Pipelines

8 min read

technology

Agentic Containerization: The Hybrid Operating System Running Your Watch Business

10 min read

How We Decide What Gets an Agent — And What Doesn't

The Question That Changes Everything

What Gets an Agent vs. What Doesn't

Keep It Simple — Three Components

Think From the Agent's Perspective

Three Agent Modes, Three Use Cases

The Feedback Loop

Frequently Asked Questions

What tasks should NOT be handled by an AI agent?

Why limit the agent to 8 tool calls per session?

Does the agent learn from my approvals and dismissals?

Can the agent send messages on my behalf without approval?

Why "four tool calls" for a sourcing verdict specifically?

Related Articles

How Mazalgo Uses Agentic Workflows to Build Real Data Pipelines

Agentic Containerization: The Hybrid Operating System Running Your Watch Business