Top 10 ways to ruin your Google Analytics data and how to avoid them

A lot of businesses rely on Google Analytics to assess the performance of their online efforts with regards to online sales, marketing, support, or just providing users with information about a brand or a product. Any measurements and conclusions based on them, however, are only as good as the accuracy and reliability of the data they rely on. The quality of Google Analytics data becomes of critical importance, especially for businesses that are entirely web-based, such as e-commerce sites.

Here I present the top 10 ways one can destroy the quality of Google Analytics data. The list is based on the hundreds of Google Analytics audits I’ve done over the past 10 years in my quality as a Google Analytics certified individual (since 2009) and as a Google Certified Regional Trainer on AdWords & Analytics. For each issue, you’ll see advice on how to detect it and what you can do to avoid finding yourself in a situation where you need to explain to your uppers why the data you have gathered in the past several years is not really usable…

1.) Missing Google Analytics tracking code

While it may seem obvious, it’s not such an uncommon occurrence and is easier to go undetected than it seems at first glance. The simplest case is when you have no tracking code anywhere on the site. Nothing at all. A quick look at any report will reveal the ugly truth and it is clear what needs to be done.

No Pulse - No Data

If you manage multiple websites, having a Custom Alert that sends you an e-mail if the daily traffic of a site is lower than 5-10% of the usual can help detect such disasters quickly. Note that it isn’t a good idea to simply set the alert to fire when you have zero visits, since your tracking code may end up on places like Google Translate or the Google Cache and still execute from there.

It is trickier to detect if the code is missing on just some of your pages, like some of your landing pages, pages hosted on third-party systems, blog pages, etc. There are three ways to detect if the Google Analytics code is missing from just some of your pages: within GA itself, by using a crawler service, or by manual inspection.

In Google Analytics you should look for unusually high percentage of self-referrals (sessions where the referrer is your own domain) and make note of the value of the “Full Referrer” and “Landing Page” dimensions for these sessions. It is hard to describe all scenarios where the data can be suspect, but I’m fairly sure you’ll be able to pick up unusual activity if you have done analytics on the site for a while.

As a second option, you can use a crawler service like ScreamingFrog to check whether the Google Analytics code is present on each page of your site, but as far as I know it only works if the code is hardcoded and will not work if you implemented it via a tag management solution. Even in this case you can use it to make sure the tag management container is present on all pages and then double-check your trigger rules to confirm the Google Analytics code is fired on all pages it needs to be present on.

With manual inspection, you go through a set of representative pages and manually check for the presence and correct functioning of the Google Analytics tracking code. You’d want to use either the Google Tag Assistant plugin (it’s recording functionality is especially useful), or the Google Analytics Debugger plugin. Manual inspection of your browser’s Network activity also works, but is usually more cumbersome. It is recommended to manually inspect at least a select set of pages on your site even if a crawler shows that the code is present on all pages of the site. The fact that a piece of code is present doesn’t always mean it is working as intended! The above-mentioned tools can help you ascertain if it is.

2.) Double-Tracking

While having “too little” tracking, as in #1 above is bad, having “too much” tracking is not good either. There have been multiple cases where I’ve seen the same GA tracker ID (property ID) register more than one pageview per page load. Reasons for this can be different: old code left forgotten in a template, having both a hard-coded code and a GTM implementation code, having an older and newer versions of the Google Analytics tracking code (see more on that in #3 below)… In some cases it could be bad copy/pasting from tutorials or help examples, carrying over a ga(‘send’, ‘pageview’) line and placing it where it doesn’t belong. Sometimes it’s just sloppy web dev work.

Detecting that you have double-tracking is easy if it happens on all or almost all pages of the site, as you will see something like this:

Google Analytics Double-Tracking

Since the pageview tracking code is firing twice on each page, you get a near-zero bounce rate, which is, of course, practically impossible. Double-tracking also results in:

  • tracking 2x more Pageviews for each page on your site
  • your Pages/Session metric being 2x what it needs to be
  • multiple metrics & reports that rely on pageviews are be affected, e.g. Navigation Summary, Behavior Flow, Goal Flow…

This issue has consequences nearly as dire as issue #1, but it gets even worse when only some of your pages have double-counting going on. In such a case, the issue may not be obvious at all! The overall metrics would look fine and you may only detect the problem upon closer inspection of certain reports, thus the data pollution can linger on for a while without being detected.

To avoid the issue, it is best that you use Google Tag Manager or similar software to deliver your Google Analytics code and to make sure there are no left-overs of hardcoded instances of such code. This can be a challenge when managing a site that is comprised of multiple systems (e.g.: main site, e-commerce software, blog software, support software).

The best way to detect individual pages or groups of pages that have double-tracking issues is to examine your landing pages report in depth to check for landing pages with suspiciously low bounce rates (you can use our Google Analytics Health Status Checker to take care of that). Once you identify such pages, you’d want to use either the Google Tag Assistant plugin (its recording functionality is especially useful), or the Google Analytics Debugger, similarly to how we used them in point #1 above, but this time you’d be looking for two or more pageviews firing when you load a given page.

Please, note that this and any other issue on the list is made that much worse due to the fact that there is no way to alter Google Analytics data once it has been gathered, or to fill in “data gaps” retroactively.

3.) Different versions of the Google Analytics code

Google Analytics Data Integrity Attacks

As with #1 and #2 above, this is mostly an issue with more complex systems where replacing a classic ga.js version of the code with the Universal Analytics (analytics.js) library can be a challenging task. Still, it is not uncommon to see it happen on otherwise simple sites that use an external form-management plugin, or a Google Analytics plugin with some special functionality, like tracking events or e-commerce transactions.

Regardless of the particular case, having Classic Analytics code on some pages of the site and Universal Analytics code on others, or having one version for page tracking and another version for event tracking, ecommerce tracking, etc. will lead to sessions breaking up. This is due to the different cookies used by these code libraries. The result is strange referrals, strange landing and exit pages, total mess with traffic source attribution… In short: unusable statistics.

The best way to prevent this is solid web development and having a centralized tag management solution. There is no simple rule for detecting such issues, but the significant mess caused by having different versions of the code is usually easy to spot at multiple levels. If you are not yet using it, you should strongly consider migrating to Universal Analytics.

4.) Tracking visits on other sites

Many Google Analytics users don’t realize that anyone can send data to their Google Analytics tracker. Like, literally anyone. All that is required is that the tracking code is placed somewhere where it will execute, or that the Measurement Protocol is used to send hits with your tracker ID attached to them. In benign cases when a Google Analytics tracks visits from sites other than your own, it’s because your code was placed on the wrong site by mistake, or because a service has indexed your site and is serving it under their own hostname, GA tracker and all. Good examples for the latter case are services like Google Translate and Google Cache, which will render your page for end users under their hostname, not yours.

In order to see which domains your tracking code appears on, navigate to the “Audience > Technology > Network” report and switch the primary dimension to “Hostname” (the default is “Service Provider”).

Google Analytics Hostname Report

(the hostnames report of a site that doesn’t follow best practices)

You should see your own domain and any subdomains you might be using. Seeing translate.googleusercontent.com means your site registered visits that happened on Google Translate, while seeing webcache.googleusercontent.com means users viewed the Google cache version of your site. What you don’t want to see are hostnames corresponding to your development/staging environments, like dev.something.com, stage.something.com, or localhost. You also don’t want to see “(not set)” there, as it is usually associated with Google Analytics spam bots, however it can also be a sign of legitimate, but poorly configured Measurement Protocol requests. If you see hostnames you are not familiar with, take precautions before visiting them, as they are likely to be Google Analytics spammers, so opening the URLs might be unsafe.

While the impact of such issues will vary depending on the percentage of traffic coming from undesirable hostnames, as well as how different that traffic is from your regular traffic, it is a best practice to keep your main view free of visits from any hostname other than the one you want tracked. This is best achieved by employing an Include filter where you white-list the domains from which you want to see data.

5.) Incorrect or missing campaign tagging (utm_ url parameters)

Here we get to something specific to running advertising campaigns and/or newsletters, which most websites nowadays do. When you run any kind of campaign, you want to use specially crafted URLs that contain campaign tracking parameters. The only exception is for AdWords campaigns, where the available automatic linking is the recommended option as it imports more information, including cost information.

The concept behind campaign URL tagging is simple: you add URL parameters that indicate to Google Analytics what the source, medium and campaign name for this click are, and whatever you enter as parameter values will then appear in your traffic source and campaign reports. The optional field to specify a content ID can be used effectively to distinguish between different creatives or ad copy. Building a tagged URL is easy: you can use Google’s Campaign URL Builder or our Advanced URL Builder, which offers a few additional perks. A tagged URL will have something like this attached to it: utm_medium=cpc&utm_source=facebook&utm_campaign=April_Promo

Having incorrect tagging results in the inability to distinguish between paid and organic traffic from the same source (e.g. Facebook Organic reach and Facebook boosted posts engagements, Twitter organic versus Twitter sponsored content, etc.). Even if you don’t have organic traffic coming from the place you are advertising on, you should still use URL tagging to distinguish between traffic from different campaigns and creatives/copy. If you don’t want to end up in a situation where you won’t have a clue whether a particular advertising campaign was effective or not, make sure you have properly tagged URLs!

Detecting if you have proper campaign tagging is as simple as visiting the “Acquisition > Campaigns > All Campaigns” report and checking to see if the campaigns you expect to see there are in fact present.

6.) Not tracking goal conversions / e-commerce transactions

It sounds a bit ridiculous to need to explain this, but I will do it, since I’ve seen more than one Google Analytics account with zero goals and even online stores with no e-commerce tracking. If you fail to set up goals, you’ll be at significant disadvantage when trying to separate users that are good for your business from those that aren’t, you’ll have poor ability to identify bottlenecks in important processes and you won’t be able to use many Google Analytics reports. If you have an e-commerce store and you have not yet implemented e-commerce tracking, you are missing the ability to calculate proper revenue per user and revenue per session metrics, you won’t be able to segment users by the products they buy, you won’t be able to evaluate the time or the number of interactions it takes to complete a purchase, etc. etc.

Another thing to consider is that you want to have a funnel set up for most goals in your view. It could be as simple as a “Form > Form Submitted” funnel, or it could be a 10-step conversion path. Having funnels in place means you can examine user behavior in-depth using the Funnel Visualization report and the much better (but also more complex) Goal Flow report.

Goal Flow Report

(example of a simple Goal Flow report)

Checking whether you have an issue is as simple as visiting the E-commerce report section and the Goal reports section. To evaluate if you have proper funnels, check the Funnel Visualization report for each goal. Make sure to check for duplicate steps (e.g. having the Goal Destination URL as the last step of the funnel) or strange entry and exit pages that may indicate an improperly configured funnel. Be suspicious of multiple steps with zero drop-off rate, as it can be due to a step of the funnel matching subsequent steps. (N.B. The match type of the funnel steps is the same as the Goal Destination match type!)

If you have a large online store and have the capacity to do deep analysis of your user behavior, consider implementing Enhanced E-commerce tracking. It offers some really nice perks, but be warned – implementing it can take your dev team several weeks, unless you are using a CMS for which there is a ready-made enhanced e-commerce plugin.

7.) Bad event tracking implementation

Event tracking is a regular part of many Google Analytics setups, as they help digital marketers and UX specialists track user behavior that doesn’t lead to a page load and that also doesn’t warrant having a virtual pageview in place. Think of clicks on phone numbers or expanding a small tooltip or tracking a form error that user saw.

There are two major issues that I observe with event tracking implementations. By far the most prevalent one is that events which don’t require a user engagement, for example scroll-tracking, page engagement timers, or automatically displayed layovers, fire with the default value of the nonInteraction flag, which is false. Such events alter session-level metrics like bounce rate and average session duration, making them highly unreliable or even meaningless. Similar to double-tracking, the bounce rate on pages where such events fire will be near-zero while the session duration may be significantly extended without any active participation from the user since loading the page.

Detecting the issue is similar to #2 above, but focusing mostly on bounce rate and with more reliance on manual inspection of the Google Analytics code that fires using Tag Assistant and the GA Debugger. The fix is easy: just set the nonInteraction flag to “1” or “true” for these events.

Another, though rarer issue, is that sometimes an event tracking code fires before the pageview tracking code has a chance to fire. It often happens when a custom dimension value or some other value is pushed automatically to Google Analytics via an event (for whatever reason) and that event fires before the pageview. It may not even do so consistently and only happen for a percentage of users, but even so it can be a significant issue, as you will see the following in your Landing Page report:

Google Analytics Bad Events

Yes, landing page “not set”! This is a very unpleasant situation as the landing page often caries a lot of information about the user and is frequently used in advanced segment definitions. The only way to resolve this is to try and postpone the firing of the event, or to try and make sure the pageview code has fired before the event code is executed. Alternatively, evaluate whether you need to push this information as an event – in some cases it may be possible to achieve the same effect by altering the pageview tracking code.

8.) Not using virtual pageviews

If you are not familiar with the concept of virtual pageviews, make sure to check out our guide to virtual pageviews in Google Analytics before you read on. Nowadays Single-Page Application (SPA) websites are becoming more and more popular, AJAX-driven functionalities are everywhere, while dynamically changing the visibility of significant parts of the page have been around forever. All of these are cases where virtual pageviews need to be employed in order to have good understanding of the user behavior on the site. Surely, you want to be able to track when a user logs in via an AJAX-driven form, or when he clicks a tab that presents him with a significant quantity of information without actually triggering a page load to occur.

With Single-Page Applications the issue is even more significant, as practically all pageviews on such an app need to be implemented as virtual pageviews, otherwise there will simply be no pageviews registered. Please note that in the age of pushState and replaceState a change in the URL doesn’t mean what it used to mean. The URL can change without a page load occurring. In order to check if a page load has occurred one needs to rely on inspecting the network activity of the browser, or on the tools I recommended in #1 of this article.

Detecting this is not trivial and most of the time requires manual inspection of the website, noting where there are significant changes to the page that are either AJAX-based or JavaScript based and making sure all interesting interactions of this kind trigger a virtual pageview (or event). With SPAs, it is absolutely necessary to consider the need to fire virtual pageviews before you even start developing the site and account for it in your development schedule.

9.) Missing Referral Exclusions

This was something that most people didn’t need to worry about with Classical Analytics (ga.js), but with Universal Analytics (analytics.js) things have changed and now if a user comes back through another source, he will start a new session, even if the previous one hasn’t yet expired. So things like this are possible:

PayPal Referrer

In other words if you don’t have proper payment gateway referrer exclusions you will have issues attributing your goal conversions or e-commerce transactions to the original source. The payment gateway, in the example above: PayPal, will obscure the source for some or all of your payments, depending on the amount of orders that go through it. This way it will be hard to trace back the original source that brought in the conversion. You would need to resort to Multi-Channel Funnels and Attribution Modelling all the time, and while they are great tools for sure, they have some significant limitations when it comes to applying advanced segments, secondary dimensions and others.

Detecting if you are suffering from this issue is usually fairly easy to do. Check your source/medium report and see if a portion of your transactions are attributed to your payment gateway. If they are – you likely have an issue. In some cases a minor amount of such sessions is expected, but most of the time you’d expect this number to be near 0% of your transactions.

Payment gateways aside, if your GA property has been set up before 2015, it won’t have your domain name added as a referral exclusion. Properties created after Universal Analytics have it added by default. If that is the case, you need to make sure to manually add your domain name to the referral exclusion list.

10.) Referrer Spam / Bots

Referrer spam is a pollution of your Google Analytics stats by misguided or malicious people, trying to use the Google Analytics reporting interface as their ad placement. We’ve already written in detail about referrer spam and how to fix it, language spam and how to fix it and there is an infographic on the topic if you prefer a more visual exposition. Finally, there is the Auto-Spam Filters tool which automatically protects tens, or if need be, hundreds of Google Analytics views against all kinds of spam traffic with the click of a mouse. The topic is pretty much covered from every possible direction, so just refer to our guides and tools if you need help understanding it. The bottom-line is that it will pollute your Google Analytics data, making any conclusions you make from it less accurate and it is best to be pro-active about it, especially if your site doesn’t get many thousands of visits per day (less-trafficked sites are hit harder, as referrer spam usually represents a higher percentage of the total visits, thus skewing the statistics more heavily).

All Out Google Analytics Spam

(example of referrer spam, combined with language spam and page title spam)

Bots with other purposes are a curious issue. Google has an option called “Exclude traffic from known bots”, which made no change or only a tiny difference when I estimated its effect in the past. The fact is that generally, Google Analytics won’t pick up on most bot traffic, as most bots will not execute the Google Analytics code. However, over the years I’ve seen many sites, including e-commerce sites, having their Google Analytics data heavily polluted by bots with unidentified purpose and source. Usually the visits would come from a diverse set of IPs, suggesting some kind of bot network. The way I was able to identify such traffic is by unusual behavior patterns (low engagement rate, no goal conversions or e-commerce transactions) and by some peculiarities in their technical information – browser version, device, screen, flash version and so on – usually a combination of the above.

The effects that I’ve seen have been fairly devastating in that they heavily polluted a lot of data. However, unlike many of the other Top 10 Issues it is usually possible to retroactively solve it with a clever Custom Segment.

Concluding remarks

While this list could have gone on with serious missed opportunities like not making use of custom dimensions, or not using content groupings, or not gathering as much site speed data as GA allows, I chose to focus on the top 10 issues that can and often does ruin to a significant extent the ability of marketers and analysts to use their Google Analytics data. I hope you learnt something new and that you won’t be a victim of the above anytime soon!

If you are looking for a quick way to identify if your Google Analytics data is accurate and if it is suffering by one or more of the above issues, our Google Analytics Health Status tool is a must-have, as it can do in seconds what takes an experienced analyst 30-60 minutes, and it can do it reliably time and again. It is especially useful for digital agencies when on-boarding new clients as well as for regular data quality check-ups on client’s analytics accounts.

Georgi is an expert internet marketer working passionately in the areas of SEO, SEM and Web Analytics since 2004. He is the founder of Analytics-Toolkit.com and owner of an online marketing agency & consulting company: Web Focus LLC and also a Google Certified Trainer in AdWords & Analytics. His special interest lies in data-driven approaches to testing and optimization in internet advertising.

Facebook Twitter LinkedIn Google+ 

Top 10 ways to ruin your Google Analytics data and how to avoid them by

Enjoyed this article? Share it:

Buffer
This entry was posted in Google Analytics and tagged , , , , , , . Bookmark the permalink. Trackbacks are closed, but you can post a comment.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

Email