Memory Layer
Evaluations

Metrics

MetricMeaning
Success rateWhether the task met its expected outcome.
Recall@KWhether relevant items appear in the top K results.
MRRHow early the first relevant result appears.
nDCGWhether useful results rank near the top.
Assertion recallWhether expected factual assertions were recovered.
Token costModel context or generation cost used by a run.
LatencyHow long retrieval, answer generation, or eval work took.

Interpret carefully

Metric improvement is evidence for a bounded claim about the suite, model, and configuration used. It is not universal proof that every future agent task will improve.

Next

Read Reproducibility.

On this page