Mazalgo LogoMazalgo
    Theme
    technology
    agentic AI
    AI guardrails
    predictable AI
    autonomous systems
    AI architecture
    watch trading technology
    deterministic systems

    Managing Agentic Behaviors: How We Engineer Predictable Outcomes from Autonomous Systems

    The hard problem with AI agents isn't intelligence — it's predictability. How Mazalgo engineers deterministic outcomes via architectural guardrails.

    4/16/2026
    9 min read

    The hardest problem in building AI agents isn't making them smart. It's making them predictable. Anyone can wire a language model to a data source and call it an "agent." The real engineering challenge is ensuring that agent does exactly what it should, every time, without supervision — and never does what it shouldn't.

    The Predictability Problem

    Deterministic vs. Probabilistic

    Traditional software is deterministic: the same input produces the same output. AI systems are probabilistic: the same input can produce different outputs depending on context, phrasing, and model state. When you're building a system that computes trade margins with real money on the line, variability is not a feature — it's a bug.

    A pipeline that correctly parses "WTS Sub 126610LN $12.5k mint" but misreads "Letting go of my 126610 for twelve five, complete" is a pipeline you can't trust. Not because the model is bad — but because language is ambiguous and probabilistic systems can't guarantee consistency.

    Deterministic Where It Matters, Intelligent Where It Helps

    The foundational architecture decision at Mazalgo: we don't use AI for everything. We use deterministic systems for anything that touches a number and AI for anything that touches language.

    Mazalgo System Design — Where Each Approach Is Applied

    System Component Approach Why
    Reference number extraction Regex patterns per brand A Rolex ref is a finite set of patterns — no ambiguity, no hallucination
    Price extraction Rule-based ($XX,XXX / XXk) Dollar amounts are deterministic; rules handle edge cases explicitly
    Condition detection Keyword matching (BNIB, mint, etc.) Condition terms are finite and consistent across dealer language
    Margin calculation Formula: (median − asking) / asking × 100 Math doesn't vary between runs; the same inputs always produce the same output
    Deal scoring (STEAL/BUY/THIN/PASS) Threshold logic on margin % Verdicts are computed, not inferred — fully auditable and reproducible
    Natural language summaries LLM inference Appropriate use: flexibility here has low stakes and high value
    Sentiment analysis LLM classification Appropriate use: probabilistic output acceptable; no single result is load-bearing
    Morning brief composition LLM with structured data inputs Language model writes narrative; structured data provides ground truth

    Guardrails, Not Guidelines

    The second architectural principle is the most important for production systems: constraints are enforced in code, not in prompts.

    The Difference Between a Guideline and a Guardrail

    Telling an AI "never send messages in WhatsApp groups" in a system prompt is a guideline. Building a service that physically cannot send messages — because the send function does not exist in its codebase — is a guardrail. Guardrails hold under adversarial inputs. Guidelines do not.

    This principle applies at every level of our system. Our group monitoring service is listen-only by architecture: it has a receive function and no send function. Data writes go through validated schemas — an agent cannot store a deal without a reference number, a price, and a source. Rate limits and resource caps are infrastructure-level, not prompt-level.

    Measuring Predictability in Production

    How do you know an autonomous system is behaving correctly when no one is watching? You measure the outputs.

    Every pipeline run produces countable results: leads scanned, deals extracted, matches found, alerts dispatched. These metrics are tracked per-interval and compared against historical baselines. When a scanner that normally finds 15–30 WTB leads per run suddenly returns zero, that's a signal — not that the market went quiet, but that the pipeline needs attention. When a WhatsApp bridge that processes 200 group messages per hour drops to 10, the health check surfaces it before any user notices missing deals.

    Predictability isn't about perfection. It's about knowing when something deviates from expected behavior and having the instrumentation to catch it quickly.

    The Trust Equation

    Agentic systems succeed or fail based on trust. Can a trader trust that the system is watching while they sleep? Can they trust that a "STEAL" verdict means the margin is real? Can they trust that the agent won't accidentally send a message in a dealer group?

    Trust comes from architecture, not promises. Deterministic extraction, mathematical pricing, structural guardrails, and continuous measurement — these engineering decisions are what make autonomous systems trustworthy. The alternative — an AI that's "usually right" — isn't good enough when real money is on the line.

    Key Takeaways

    • Deterministic systems (regex, rules, formulas) handle anything that touches money — reference extraction, pricing, margin calculations, deal scoring
    • AI is used only where probabilistic output is appropriate: language summaries, sentiment classification, narrative composition
    • Constraints are enforced in code (guardrails), not in system prompts (guidelines) — guardrails hold under adversarial inputs

    Mazalgo's agentic architecture is built on deterministic intelligence — buy zone calculations that are mathematical, not guessed.

    Frequently Asked Questions