# Schema Alignment Guide

Engage 17.2.0 introduces a rewritten analytics data cleanup system and a database schema alignment. This guide walks you through the required post-upgrade steps.

## Analytics data cleanup changes

The analytics data cleanup has been fully rewritten. It now processes all eligible records in a single pass rather than working in fixed-size batches.

### New configuration settings

The cleanup schedule is configured under `Engage:Analytics:DataCleanup` in your `appsettings.json`:

{% code title="appsettings.json" %}

```json
{
  "Engage": {
    "Analytics": {
      "DataCleanup": {
        "Enabled": true,
        "FirstRunTime": null,
        "StartupDelay": "00:05:00",
        "Interval": "1.00:00:00",
        "CommandTimeout": 1200
      }
    }
  }
}
```

{% endcode %}

| Setting          | Description                                                                                                                          | Default                 |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ----------------------- |
| `Enabled`        | Whether the data cleanup process runs at all.                                                                                        | `true`                  |
| `FirstRunTime`   | Optional crontab expression to schedule the first cleanup run. When set, the calculated delay must still be at least `StartupDelay`. | `null`                  |
| `StartupDelay`   | Minimum time after application startup before the first cleanup run. Used directly as the delay when `FirstRunTime` is not set.      | `00:05:00` (5 minutes)  |
| `Interval`       | Interval between cleanup runs.                                                                                                       | `1.00:00:00` (24 hours) |
| `CommandTimeout` | Database command timeout in seconds.                                                                                                 | `1200` (20 minutes)     |

{% hint style="info" %}
The previous settings `StartAfterSeconds`, `IntervalInSeconds`, and `NumberOfRows` are deprecated and no longer used.
{% endhint %}

### How the cleanup pipeline works

The cleanup runs as a background job on a recurring schedule. Each run executes three phases in order:

1. **Anonymize** — Replaces personally identifiable information with anonymized values for data that has exceeded the `AnonymizeAnalyticsDataAfterDays` retention period.
2. **Delete analytics data** — Deletes pageviews, control group data, and raw data that has exceeded its respective retention period (`DeleteAnalyticsDataAfterDays`, `DeleteControlGroupDataAfterDays`, `DeleteRawDataAfterDays`).
3. **Delete orphaned data** — Removes records no longer referenced by any analytics data, such as sessions, visitors, and devices without pageviews.

You can extend the cleanup pipeline with [custom data cleanup processors](/umbraco-engage/developers/analytics/extending-analytics/custom-data-cleanup-processors.md).

{% hint style="warning" %}
Until the schema alignment below is completed, only the anonymization and visitor control group/raw data cleanup phases will run. Full cleanup of pageviews, sessions, and visitors requires the schema alignment to be complete.
{% endhint %}

## Database schema alignment

The schema alignment brings existing installations in line with a clean install by adding missing foreign keys (with `ON DELETE CASCADE`), indexes, and constraints. This is a **manual post-upgrade step** that requires running SQL scripts during a maintenance window.

### Why this is needed

Older installations may have accumulated orphaned records over time. The new foreign key constraints with `ON DELETE CASCADE` require that all existing data satisfy these relationships. Running the scripts ensures your database is consistent before the constraints are added.

### Post-upgrade steps

After upgrading to 17.2.0, follow these steps during a **maintenance window**:

{% stepper %}
{% step %}
**Determine a safe DeleteAnalyticsDataAfterDays value**

Run the `GetDeleteAnalyticsDataAfterDays.sql` script against your database. This analyzes your data volume and recommends a safe initial value for the `DeleteAnalyticsDataAfterDays` setting.

Update your `appsettings.json` with the recommended value before proceeding. This ensures the first cleanup run does not attempt to delete an excessive number of records at once.

{% hint style="info" %}
This step is optional but recommended. It prevents the first cleanup run from processing a large backlog of data, which could cause long-running queries and lock contention.
{% endhint %}
{% endstep %}

{% step %}
**Ensure data consistency**

Run the `EnsureDataConsistency.sql` script against your database. This script cleans up orphaned records across all Engage tables. It removes or nullifies rows that reference non-existent parent records, and then re-validates all existing foreign key constraints.

This step is **required** before running the schema alignment script. Without it, the `CompleteAlignSchema.sql` script will fail if orphaned data violates the new constraints.
{% endstep %}

{% step %}
**Complete the schema alignment**

Run the `CompleteAlignSchema.sql` script against your database. This re-creates all foreign keys with `ON DELETE CASCADE` and re-enables any disabled or untrusted constraints. It also adds any missing indexes to align your schema with a clean install.

{% hint style="warning" %}
The script contains 6 numbered batches separated by `GO` statements (besides validation and completion steps). Ensure each batch has completed successfully by inspecting the returned/printed messages.
{% endhint %}
{% endstep %}

{% step %}
**Deduplicate page variants (optional)**

Run the `DeduplicatePageVariants.sql` script against your database. This consolidates duplicate page variant rows. A previous bug caused new rows to be inserted per pageview instead of reusing existing data.

{% hint style="info" %}
This step is optional. The update on the pageviews table can take a while on large installations, so plan accordingly.
{% endhint %}
{% endstep %}
{% endstepper %}

### Monitoring

Two new Engage health checks are available in the Umbraco Health Check dashboard:

* **Database Schema Status** — Verifies the `Umbraco.Engage+DatabaseSchemaStatus` key is set to `Complete`.
* **Constraint Integrity** — Validates that all expected foreign keys and indexes are present.

### Downloading the scripts

{% file src="/files/08L5ER5jsSmfYDhKo0i9" %}
Recommends a safe initial `DeleteAnalyticsDataAfterDays` value based on your data.
{% endfile %}

{% file src="/files/49vVwoDjkQP80C83rhFx" %}
Run during a maintenance window **before** `CompleteAlignSchema.sql` to clean up orphaned data and verify all foreign key constraints.
{% endfile %}

{% file src="/files/GU06ed3L7BYGLXOrekIA" %}
Run during a maintenance window **after** `EnsureDataConsistency.sql` to add missing foreign keys, indexes, and constraints.
{% endfile %}

{% file src="/files/gPknpTIEJ1GCo48GRH4U" %}
Consolidates duplicate page variant rows and reassigns pageviews. Run during a maintenance window as the update on the pageviews table can take a while on large installations.
{% endfile %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.umbraco.com/umbraco-engage/upgrading/schema-alignment-guide.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
