Umbraco Engage
CMSCloudHeartcoreDXP
13.latest (LTS)
13.latest (LTS)
  • Umbraco Engage Documentation
  • Release Notes
  • Support
  • Installation
    • System Requirements
    • Installation
    • Licensing
    • Verify your Engage installation
  • Upgrading
    • Upgrade Umbraco Engage
    • Version specific Upgrade Notes
    • Migrate from uMarketingSuite
  • Getting Started
    • Getting Started
    • For Marketers and Editors
      • Cockpit
      • Marketing Resources
    • For Developers
      • Infrastructure sizing
      • Load Balancing and CM/CD Environments
      • Content Delivery Network recommendations
      • Cockpit
      • Content Security Policy nonce configuration
      • Troubleshooting installations
  • Marketers and Editors
    • Introduction
      • The Umbraco Engage Section
      • Content Apps
      • The Umbraco Engage Cookie
    • Analytics
      • What is measured by default
      • Client-side Events
      • Types Of Clients
      • Campaigns
      • Device Type
      • Location
      • Referral Traffic
      • Forms
      • Videos
      • Scroll Heatmap
      • Google Analytics vs Umbraco Engage
      • Search Terms
    • A/B Testing
      • What is A/B testing
      • Types of A/B Tests
        • Single-page A/B Test
        • Multiple Pages Test
        • Document Type Test
        • Split URL Test
      • Setting up the A/B Test
      • Previewing an A/B Test
      • Monitor the A/B Test
      • A/B Test Distribution Algorithm
      • Front end Rendering
      • Finish an A/B Test
    • Personalization
      • Creating a Segment
      • Setting up Personalization
      • Cockpit Insights
      • Implicit and Explicit Personalization
        • Setting up the customer journey
        • Personas
        • Implicit Personalization scoring explained
        • Content Scoring
        • Campaign Scoring
        • Referral Scoring
    • Profiling
      • Profile detail
      • External profile data
    • Reporting
    • Settings
      • Goals
      • IP Filtering
      • Configuration
      • Permissions
  • Developers
    • Introduction
      • Dataflow Pipeline
        • Data Collection
        • Data Storage
        • Data Parsing
        • Reporting
      • The Umbraco Engage Cookie
        • Module Permissions
      • Performance
    • Analytics
      • Request tracking
      • Bot detection
      • Capture location data
      • Extending forms
      • Video tracking
      • Scroll Heatmap
      • Client-side events
        • Additional measurements with analytics scripts
        • Bridging Library for Google Analytics
        • Bridging Library for Google Tag Manager
        • Google Analytics Blocker Detection
        • Create your own events
      • Extending Analytics
        • Getting the Correct IP Address
        • Sending data to the GTM Datalayer
    • A/B testing
      • Retrieving A/B test variants in C#
    • Personalization
      • Implement your own segment parameters
      • Retrieve segment information from code
      • Add custom scoring
    • Profiling
      • External Profile Data
    • Reporting
    • Settings
      • Custom goals scoring
      • Configuration
    • Headless
      • Using the Engage API
      • Headless Example
  • Security and Privacy
    • Security and privacy
    • Retention periods of data
    • Anonymization
    • GDPR & EU regulation
      • How to become GDPR compliant using cookiebot
    • How it works
  • Tutorials
    • Overview
    • How to Get Started with Personalization
    • How to Create a Persona
    • Create a Personalized Popup in 5 minutes
    • How to set up an A/B Test
    • Marketing Resources
      • Generic Topbar Template
      • Generic Popup Template
      • Generic Exit Intent Popup Template
Powered by GitBook
On this page
  • Getting the data
  • Parsing
  • Normalize the data
  • Relate data to Umbraco nodes
  • Goals
  • Configuration options
  • Cleaning up the data

Was this helpful?

Edit on GitHub
Export as PDF
  1. Developers
  2. Introduction
  3. Dataflow Pipeline

Data Parsing

On this page you can find information about Data parsing and how to store the data in a normalized and efficiant way.

PreviousData StorageNextReporting

Last updated 6 months ago

Was this helpful?

Now that the data is it is time for the next step.

Getting the data

There is a background process constantly running on the webserver to check whether there are unprocessed pageviews in memory or records in the table umbracoEngageAnalyticsRawClientSideData.

The records in the table umbracoEngageAnalyticsRawClientSideData can be identified because the column processingStarted is NULL.

If the background process finds unprocessed pageviews in memory or one of these unprocessed records it fetches the rows of data and starts processing it. Once it has finished processing it updates the record in the table by setting values in the columns 'processingFinished' and 'processingMachine'.

Parsing

When the data is fetched Umbraco Engage will perform some different actions:

Normalize the data

All data is stored in a normalized way in the tables with the prefix: umbracoEngageAnalytics.

For example; each browser is only stored once in the table umbracoEngageAnalyticsBrowser and each browser version is stored once in the table umbracoEngageAnalyticsBrowserVersion.

The session is now related to the primary key ID of the browser version instead of storing the full-text string. This way, data can be queried effortlessly and is stored more efficiently (only an integer per browser instead of a text string).

This happens for all data:

  • Browser and browser version

  • Operating system

  • Visitor type

Relate data to Umbraco nodes

Goals

Configuration options

  • The IntervalInRecords setting specifies how many unprocessed records should be fetched per parsing process.

  • The IntervalInSeconds setting specifies how often the background process is triggered and how often the parsing happens.

The higher you set these amounts the less frequent the parsing takes place.

It is possible to specify which web server should execute the processing step. The processing step is the heaviest in the data flow process. Most likely it will not have any impact, but for optimization reasons, you can specify which server is responsible for processing the raw data. This can be one web server, multiple web servers, or even a dedicated web server that does not serve the website itself. This can be set with the setting IsProcessingServer.

Cleaning up the data

There is probably no or little reason to store this data forever. That is why we have two settings to clean up this data.

  • The first setting is 'AnonymizeDataAfterDays'. After the set number of days, the data will be anonymized. This means the data will still be shown in aggregate reports like pageviews, used browsers, number of visitors, etcetera, but it can not be related to an individual visitor anymore.

  • The second setting is 'DeleteDataAfterDays'. With this setting the data will be deleted after a set number of days. The reason is that it does not make sense to store your data for all eternity.

When the data was only the URL was stored. In the parsing step, we try to identify which Umbraco node and which culture is served on this URL. This is an important step to what happened on which page within the Umbraco backoffice.

Within Umbraco Engage you can via a specific page that is reached or an event that has been triggered. When parsing data Umbraco Engage checks whether one of the goals is reached with this record.

How frequently the data is processed can be set in . Two parameters can be set:

If using ensure the front-end servers have the configuration setting for IsProcessingServer set to false. Also, make sure that the backend (Umbraco backoffice) server should only have this setting enabled.

persisted in the database
stored in the raw database tables
report at a later point
set up goals
the configuration file
Umbraco in a load-balanced configuration