Agent Evaluation Field Notes

Practical scorecards for tool-using AI agents.

Compact templates, replay checklists, and RAG guardrail smoke tests for builders who need repeatable agent evaluation workflows.

Operational scorecards

Score task completion, tool discipline, evidence quality, safety, and communication.

Trajectory replay

Debug agent runs by replaying decisions, tool calls, evidence, and recovery points.

RAG guardrails

Smoke-test prompt injection, vector poisoning, source grounding, and citation behavior.