Now that the data is collected, stored, and parsed it's finally time to browse through the reports in the Umbraco backoffice.
The data is visualized in two parts in Umbraco:
The Umbraco Engage section gives an overview of all data that is recorded. The data is visualized for the entire installation and all pages and visitors. These reports give a perfect overview on a top level.
For more detailed reports on an individual page (Umbraco node), you can go to the Analytics Content app.
All the data for the reports are generated nightly at 4:00 AM (configurable). During this process all the relational data is turned into a star scheme for quick reporting in the different Umbraco Engage sections.
On this page you can find information about Data parsing and how to store the data in a normalized and efficiant way.
Now that the data is persisted in the database it is time for the next step.
There is a background process constantly running on the webserver to check whether there are unprocessed pageviews in memory or records in the table umbracoEngageAnalyticsRawClientSideData.
The records in the table umbracoEngageAnalyticsRawClientSideData can be identified because the column processingStarted is NULL
.
If the background process finds unprocessed pageviews in memory or one of these unprocessed records it fetches the rows of data and starts processing it. Once it has finished processing it updates the record in the table by setting values in the columns 'processingFinished' and 'processingMachine'.
When the data is fetched Umbraco Engage will perform some different actions:
All data is stored in a normalized way in the tables with the prefix: umbracoEngageAnalytics.
For example; each browser is only stored once in the table umbracoEngageAnalyticsBrowser and each browser version is stored once in the table umbracoEngageAnalyticsBrowserVersion.
The session is now related to the primary key ID of the browser version instead of storing the full-text string. This way, data can be queried effortlessly and is stored more efficiently (only an integer per browser instead of a text string).
This happens for all data:
Browser and browser version
Operating system
Visitor type
When the data was stored in the raw database tables only the URL was stored. In the parsing step, we try to identify which Umbraco node and which culture is served on this URL. This is an important step to report at a later point what happened on which page within the Umbraco backoffice.
Within Umbraco Engage you can set up goals via a specific page that is reached or an event that has been triggered. When parsing data Umbraco Engage checks whether one of the goals is reached with this record.
How frequently the data is processed can be set in the configuration file. Two parameters can be set:
The IntervalInRecords setting specifies how many unprocessed records should be fetched per parsing process.
The IntervalInSeconds setting specifies how often the background process is triggered and how often the parsing happens.
The higher you set these amounts the less frequent the parsing takes place.
It is possible to specify which web server should execute the processing step. The processing step is the heaviest in the data flow process. Most likely it will not have any impact, but for optimization reasons, you can specify which server is responsible for processing the raw data. This can be one web server, multiple web servers, or even a dedicated web server that does not serve the website itself. This can be set with the setting IsProcessingServer.
If using Umbraco in a load-balanced configuration ensure the front-end servers have the configuration setting for IsProcessingServer set to false. Also, make sure that the backend (Umbraco backoffice) server should only have this setting enabled.
There is probably no or little reason to store this data forever. That is why we have two settings to clean up this data.
The first setting is 'AnonymizeDataAfterDays'. After the set number of days, the data will be anonymized. This means the data will still be shown in aggregate reports like pageviews, used browsers, number of visitors, etcetera, but it can not be related to an individual visitor anymore.
The second setting is 'DeleteDataAfterDays'. With this setting the data will be deleted after a set number of days. The reason is that it does not make sense to store your data for all eternity.
The Umbraco Engage package is all about data, data, and more data. To make the most out of this data and do it the most efficient way we have four different stages where the data goes to.
Data collection: This is where the visitor data is collected and stored for a moment in the memory of the server.
Data storage: This is where the data from memory goes to the database.
Data processing: The data is processed at a later moment to make it more efficient and normalized
Data reporting: Finally the data is reported within Umbraco Engage
The concept of this dataflow is the most important concept to grasp when using Umbraco Engage.
This is the first phase of the data flow. In this stage, the data is collected from the user and stored temporarily in memory.
Umbraco Engage works via serverside collecting meaning that all initial visitor data is collected on the server and not sent via JavaScript for example. When a visitor visits your website Umbraco Engage code checks whether you already have an Umbraco Engage cookie. If not, it creates one and sends it back to you.
At the same time the visitor is making a request the visitor sends all kinds of data to the server:
Which browser the visitors are using
Which URL is requested
If there was any referring page (where did the visitor come from)
At what time the page is requested
Which IP Address is used
Which operation system is used
Which type of device is used
Which cookies are sent
This data is all collected and, because of the efficiency stored for a while in the web server memory. The idea is that storing this data in memory is faster than directly writing it to the database. It is more efficient to store multiple database records at once than to store the database records one at a time.
In the next phase, the data in memory will be stored in the database.
The beauty of server-side collection is that it always works and you're not relying on JavaScript for example. Also, there is no way for clients to block this behavior because this is "how the internet works".
Only page requests are collected in Umbraco Engage. The request needs to be a GET request returning a 200 OK. Requests to images (.png, .jpg ), .css
and .js
files are not tracked. All requests to the /Umbraco/-folder are also ignored by default.
There are different configuration options to adjust the collecting process.
You can limit the amount of data records stored in memory. If you are limited in memory you can adjust these settings to fit your needs.
The IP Address is anonymized by default. There is an option to change this
You can turn off server-side tracking. This can be useful if not every page request reaches your website. This could be the case if you're using CloudFlare for example.
The amount of data that you can collect on the server is limited. Visitors have all kinds of interactions when your website loads. They can scroll, click on the website, watch videos, and click on other pages (inside and outside of your website).
These kinds of requests need to be collected via the client side. To support this we have created a JavaScript that collects a lot of data, and extending this with your own events is possible.
If you install the package you will find this JavaScript file in the folder /Assets/Umbraco.Engage/scripts/.
This JavaScript collects the following data for you:
The maximum scroll depth as a percentage of the whole page and in absolute pixels.
The links you have clicked and at the moment you have clicked these.
The time you have been engaged on the page.
We track the time that you are actively using the page. We see whether you are scrolling, moving your cursor, or typing. As long as you are doing that we track the time.
As soon as you do not do anything of the above we stop the timer until you start doing something again.
Also if you have opened the page in a tab but you are using another website at the moment, that time will not count. We stop measuring time as soon as you have not done anything for 5 seconds.
You need to load the file at the end of your page to enable these events.
Client-side events are collected and sent to the server and stored in memory when visitors exit the page or close the tab/browser.
Looking at your website source code you will see a line of code automatically inserted by Umbraco Engage. It most likely looks like something like this:
This snippet of code ensures loading the umbracoEngage.analytics.js
file, the exact page visit will be automatically linked to the submitted client-side events.
It is also possible to push your own events to Umbraco Engage. It works 80% the same as Google Analytics Event Measurement. Read more about custom events in the Create your own events article.
There is a chance that you've already implemented all kinds of events via Google Analytics with their syntax:
ga('send','event',[eventCategory],[eventAction],[eventLabel],[eventValue],[fieldsObject]);
If that is the case you can include a bridging library we created. This bridging library ensures that all custom events sent to Google Analytics are also sent to Umbraco Engage. These events will now be sent to both systems.
The only thing you will need to do is include the script \Assets\umbracoEngage\Scripts\umbracoEngage.analytics.ga-bridge.js somewhere on your page:
Information about Data Storage and how to work with and troubleshoot it in Umbraco Engage.
When the it is temporarily stored in memory. At some point, a threshold is reached and all data is stored in the database.
Two thresholds can be set and reached which will trigger the storage of data. If one of these two is reached the data will be stored in the database.
The first threshold is the 'FlushRateInRecords'.
When this number of records is in memory the data will be stored in the database. An example could be if you set it to 100, the data will be permanently stored after 100 page visits.
The second threshold is the FlushIntervalInSeconds.
After this number of seconds, the data will be sent to the database. If you set it to 30 seconds, for example, every 30 seconds the data will be sent to the database. No matter how many records there are in memory.
Both settings can be set in the file of Umbraco Engage.
The higher the value set for these thresholds, the more memory Umbraco Engage uses on your web server(s) and less of your database connection. Please be aware the memory impact is low because there is not a lot of complex data stored.
The lower the value you set, the less memory Umbraco Engage uses on your web server(s), and the more database calls are made.
The data will be stored as quickly as possible to minimize the needed resources. For this reason, the the data collected from client-side events will be stored in so-called raw tables in a non-normalized. This data will be processed in of the data flow.
The data collected from clientside events is stored in the table umbracoEngageAnalyticsRawClientSideData
.
When the data is stored in these tables the columns processingStarted
, processingFinished
, processingMachine
, and processingFailed
are empty. They will be filled in the parsing step.