Shadow Evaluator · Diagnostic Service
Offline routing audits for agent systems.
Qalbun analyzes historical agent logs to identify candidate inference savings, under-evidenced routing decisions, and validation-ready hypotheses — without changing production systems.
The problem
Common situations where Shadow Evaluator helps
If your team is spending high-cost model inference on tasks that may not require it, routing to expensive components by default because there is no evidence layer, or unable to measure whether your agent routing is cost-effective — Shadow Evaluator tells you what your logs can actually prove.
How it works
Three steps, fully offline.
- Step 01
Profile your logs
We classify your data into one of four readiness tiers — audit-only, cost-opportunity, outcome-informed, or live-shadow-ready — so every later claim has a known evidence footing.
- Step 02
Run an offline routing audit
We reconstruct the routing decisions your system actually made and separate observed facts from estimates, with each claim tied to its source data.
- Step 03
Choose the next step
You receive a written report, an explicit list of evidence gaps, and a recommendation on whether a live shadow pilot is warranted.
What you receive
A written, reproducible diagnostic.
- Data-readiness and claims-budget classification
- Observed routing and component usage patterns
- Estimated cost opportunities when your logs support them
- Under-evidenced decisions and instrumentation gaps
- Validation-ready hypotheses for a live shadow pilot
- Reproducibility manifest for auditability
Who it is for
Teams running real routing decisions.
Built for teams running agent systems, model routers, tool-using LLM workflows, or multi-component AI systems with recurring inference decisions.
Best fit: at least 30 days of structured logs covering routing decisions, with at minimum cost or token signals or downstream outcome labels. Partial coverage is workable — the audit will tell you exactly which claims your data can support and which it cannot.
Request
Request a Diagnostic Review.
Tell us about your system. We’ll respond with scope, data handling, and next steps within two business days.
Privacy: this form is used solely to scope and respond to your request. Submissions are not shared with third parties. Please do not include logs, secrets, credentials, or customer data in any field — we will agree on a transfer method in writing if logs need to change hands.