> For the complete documentation index, see [llms.txt](https://docs.umbraco.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.umbraco.com/ai-in-umbraco/testing-and-evaluation/tests.md).

# Overview

The AI Testing and Evaluation framework (`Umbraco.AI` core) provides automated validation of AI outputs. You can define tests that run prompts or agents, then grade the results against success criteria to detect regressions and compare model configurations.

## Installation

The testing framework is included in the core `Umbraco.AI` package. No additional installation is required.

{% hint style="info" %}
You need the Prompt Management or Agent Runtime add-on installed to test prompts or agents respectively.
{% endhint %}

![The tests list showing configured tests with their features and run counts](/files/1j9sQ6yFeGYsppaauCAJ)

## Features

* **Graders** - Define success criteria with code-based and model-based graders
* **Variations** - A/B test across different models, profiles, and configurations
* **Baselines** - Set a run as a baseline for regression detection
* **Batch Execution** - Run multiple tests at once or filter by tags
* **Metrics** - Calculate pass\@k and pass^k for non-deterministic outputs
* **Transcripts** - Capture full execution traces for debugging

## How It Works

1. **Create a test** targeting a prompt or agent with one or more graders
2. **Run the test** to execute the target and grade the output
3. **Review results** including per-grader scores, metrics, and transcripts
4. **Set a baseline** and compare future runs for regression detection
5. **Add variations** to compare different model configurations side by side

## Quick Start

1. Navigate to the **AI** section > **Tests**
2. Click **Create** and select a prompt or agent to test
3. Add graders to define success criteria
4. Click **Run** to execute the test and review results

See [Getting Started](/ai-in-umbraco/testing-and-evaluation/getting-started.md) for a detailed walkthrough.

## Documentation

| Section                                                                     | Description                             |
| --------------------------------------------------------------------------- | --------------------------------------- |
| [Concepts](/ai-in-umbraco/testing-and-evaluation/concepts.md)               | Tests, graders, variations, and metrics |
| [Getting Started](/ai-in-umbraco/testing-and-evaluation/getting-started.md) | Step-by-step setup guide                |
| [Graders](/ai-in-umbraco/testing-and-evaluation/graders.md)                 | All built-in grader types               |
| [Variations](/ai-in-umbraco/testing-and-evaluation/variations.md)           | A/B testing across configurations       |
| [API Reference](/ai-in-umbraco/testing-and-evaluation/api.md)               | Management API endpoints                |

## Related

* [Prompt Management](/ai-in-umbraco/add-ons/prompt.md) - Prompt templates
* [Agent Runtime](/ai-in-umbraco/add-ons/agent.md) - Agent definitions
* [Guardrails](/ai-in-umbraco/concepts/guardrails.md) - Safety and compliance rules


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.umbraco.com/ai-in-umbraco/testing-and-evaluation/tests.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.