Memory Layer
Evaluations

Run evaluations

What you will do

Run an eval suite, write artifacts, and compare variants.

Dry run

memory eval run --suite evals/examples/memory-smoke --condition full-memory --profile offline --dry-run

Paired run shape

memory eval run --suite evals/suites/memory-improvement-v1 --condition no-memory --condition full-memory --allow-shell --repeat 5
memory eval compare --baseline 'target/memory-evals/*no-memory*.json' --candidate 'target/memory-evals/*full-memory*.json' --text

Use --allow-shell only after reviewing suite scripts and fixtures. Shell-executing evals are code execution inputs, not passive data files.

External retrievers

memory eval run --suite evals/suites/memory-improvement-v1 --condition full-memory --retriever-cmd './my-retriever' --allow-shell

Verify

Inspect the JSON artifacts under target/memory-evals/ and compare item-level results before making claims.

Next

Read Interpreting results and Limitations.

On this page