5 things that can affect the credibility of your analytical data.

When discussing analytics, I've observed that website owners frequently display absolute faith in the veracity of the data collected in services such as Google Analytics. They believe it to the point that when encountering a difference between the analytics and another source (e.g. comparing e-commerce data collected from GA vs. sales data), they'll spend an inordinate amount of time analysing where the differences originated.

However, this approach can be a frustrating experience, with no guarantee of success. So, rather than looking at analytics as the be-all-and-end-all of data, we should instead think of them as akin to exit polls in an election; they may well be accurate and trustworthy, but they don't necessarily represent the entire electorate. In other words, the data we collect may not apply to the entirety of our audience; sometimes it may apply to 99% of our site visitors, but it could be as low as 15% when the full range of factors are in play.

Thankfully, this rather low figure is rarely the norm, and we can generally trust the data that we collect. Despite this, we should take steps to ensure that we are aware of the factors that could negatively-impact the data we receive, so that we don't charge into implementing changes that don't appeal to our target audience due to faulty information. Below, you'll find five areas we should bear in mind regarding analytics, and the effects that they may have on your data.

Ad and tracking blocking.

Probably the most widespread and frustrating, uncontrollable cause of incomplete data is ad-blocking software. Depending on the country of origin and demographics of your users, usage of adblocks can be as high as 15%; in extreme cases, that number can be equal to more than half of the visitors to your site: https://backlinko.com/ad-blockers-users

Often, ad-blocking software not only blocks ads, but also analytical services such as Google Analytics that are indirectly used for targeting your ad groups.

Depending on your target demographic, your site might encounter many more users with ad-blockers installed in their browsers; the younger demos are frequently more likely to have such plugins installed. Potentially, this could lead to the majority of your users blocking analytical data in its entirety.

EFFECT: lack of availability of analytical data for a large percentage of visitors.

Internal movement

As a rule, companies implementing analytics take care to exclude internal movement from the collected data, but if we are implementing analytics personally, it may be that both our own and our colleagues' site visits generate a significant proportion of the analytical data (especially on pages with low visitor-presence).

For this reason, care should be taken to ensure that your devices are excluded from any data collection. And don't forget, you'll need to make sure that you've covered all your devices, including phones and tablets as well as your computer.

With this implemented, any tests that we run on our site (e.g. running through the purchasing process) will not skew the data collected from ordinary users.

If you use developer / site management services, you should also consider excluding their network traffic, as it can generate large amounts of page views and site interactions.

EFFECT: collecting redundant data, that will often be unindicative of the typical characteristics of user behaviour.

Other forms of network traffic masking.

The popularity of solutions such as VPNs, privacy-focused browsers and private/incognito modes, coupled with the heavily restrictive tracking policy that is being implemented on Apple operating systems also have a huge impact on the quality of the collection of analytical data.

These types of solutions can not only hide the real location of the user, but can also effectively prevent you from being identified online via the cookie handling (e.g. shortening their validity period).

EFFECT: false location data of the user, no possibility of establishing different sessions with the same user, nor to create user profile pages.

Incorrect implementation of data collection

Many of the issues we encounter are not ones that we can mitigate in any real sense; it's ultimately down to our users' preferences and habits. However, one important issue that I often observe is an overly-simple approach to data-collection; its important to properly consider the ways that our users will interact with our site to provide a clearer picture of their behaviour. Some data doesn't require a particularly technical approach e.g. gathering total impressions on individual subpages is reliable and generally happens automatically, but when we want to dive deeper and analyse user interactions on our site, how we approach the method of collection can have a significant impact on accuracy.

Here's a simple example: if we want to measure the number of people who have used the contact form, it isn't enough to merely connect events to the action of clicking the "Send message" button - if there is a validation error when sending messages, our data will record a false positive (especially when the user clicks the button several times to see if the message will finally send). Instead, we should attach the event to the moment when we receive the confirmation that the message has been sent. In addition, we should measure the number of validation errors and errors that occur when sending the form - thanks to such information, we will be able to ascertain if the form is poorly-structured or if there are any technical problems when sending data.

EFFECT: Incorrect conclusions may be drawn due to incomplete or inaccurate data.

Website errors and poor network connection

Finally, a less obvious fact that many people still seem to forget - not all people in the world have access to a fast and reliable internet connection. We have to be aware that some of the information (especially those related to user interaction with the website) may simply not reach the service that is collecting analytical data due to poor connection quality.

This issue can also occur on our end of things too - a server that operates too slowly will disrupt the collection process as, for example, the appropriate scripts will not be loaded on time.

A second potential problem could be scripting errors on the page - our analytics implementation may be correct, but if there is some other script that causes an error on the page, our analytics may not even be loaded in the first place, resulting in missing data. Fortunately, while we can't do anything to help our users improve their connection speed, we can work to limit the number of website errors by thoroughly testing the data collection process and sending events to the service that collects this data to weed out any issues.

EFFECT: incomplete data that may result in us drawing incorrect conclusions, the same negative as when we have implemented inefficient data-collection functionality.

In Summary

Considering how many issues may affect the quality of our analytical data, this data should be treated only as an indicator that can be considered alongside other data points, rather than as the one and only, 100% reliable source of knowledge about our users and their behaviour on the website.

Due to the accelerating demand for user privacy online, I have a strong suspicion that over time it will be necessary to switch to user tests on a larger scale, as the collected data may not be able to answer all the questions a website owner may have.