1. The number of people complaining about spam referrers / spam events / spam pageviews in webmaster forums and Google Groups have increased exponentially
2. The Google Analytics team noted on their Google Plus page that they know about the issue and that they will issue guidance on how to handle it
The solution they hinted to referred to filtering on the side of webmasters/website owners. This is most unsatisfactory as there should be a way to handle this more intelligently than making hundreds of thousands of GA users, many of whom don’t even know there is such a thing as “view filter”, to manually apply such filters to ALL their views.
Since, however, this is the only way that we as Google Analytics Pros and as webmasters and site owners can handle this I decided to put this all in one place and provide a solution for all ghost/spam traffic issues in one. If you are familiar with the issue, skip to the solution!
Again, what is the issue in detail?
The issue with referral spam in general is that anybody can send fake Google Analytics hits to your GA account and do so easily, effortlessly and with great efficiency. No need to access your server and all. See more details in the two posts linked in the beginning of this article. This main issue manifests itself in several different ways (so far):
1. Ghost / Spam Referrers in your Referral Traffic report
The most notorious of these being darodar and semalt, with many others following their lead. This is not real traffic, just spam to make you visit their crappy sites, offering all kinds of webmaster-oriented services plus the occasional virus/mallicious infection (that’s why you shouldn’t visit them!). This is fake traffic, only visible in Google Analytics. This traffic is not really hitting your servers. An example (click for full size):
2. Ghost / Spam Pageviews
Instead of spamming the referrer report, these spammers inject fake pageviews, containing URLs that they want you to visit. Of course, no such URLs exist on your website, these are fake URLs crafted to entice you to visit their sites. Mainly adult related sites for now. An example (click for full size):
3. Ghost / Spam Events
There has been 1-2 of these for now. They inject their spam in the event category and thus appear in your events reports. Of course, no events actually take place on your site, these are entirely fake events, only visible in Google Analytics. An example:
How to FIX 99% of Ghost / Spam Traffic issues
This solution eliminate fake traffic has two parts – the first to filter out 98% of all referrer spam, ghost pageviews and ghost events. The second: an additional filter for some of the referrer spam that will not be covered by the first filter. So with these two filters you’ll be able to filter 99%-100% of all fake traffic hitting your Google Analytics. For how long will it cover those 99-100% I can’t guarantee though… What I can guarantee you is that this is the best thing you can do for now and I don’t see how any advice given by the GA team would be better.
NEW: We’ve just launched a fully automated solution to the problem. You can now use our Auto Spam Filters tool to eliminate & protect against referrer spam & other ghost traffic. It’s a set-and-forget, 1-click wonder that works across 100s of properties and views. The filters are frequently updated for continued protection.
Solution: Part 1 – Filter hostnames
This filter INCLUDES ONLY traffic with the hostname field set to a predefined set of values (yours). Yes, it can be spoofed, but the less sophisticated referrer spammers that just leave this field to (not set) or who set a random domain name here will be filtered out. And this is most of spam/ghost/fake traffic at the moment and will likely continue to be the larger part of GA spam since investing in mass-scale cralwer software & infrastructure is not that chep.
Here is how to set it up:
1.) Go to your Hostnames report and select a date range of a month or more. This will show you the hostnames you need to consider for inclusion.
From the screenshot above it is obvious that I only need to include www.analytics-toolkit.com for this particular view. In your particular case you might need to include more than one hostname! I would not recommend including translate.googleusercontent.com in most cases even if that means losing some stats, as more adept spammers will just use this hostname to bypass the filter, without the need for a cralwer or a third-party UA-ID to hostname database.
2.) Construct and apply the filter.
You would want to use some basic regex here. For example, if I want to include all traffic to analytics-toolkit.com and a third-party shopping cart, hosted on shopify.com for example, here is what that would look like:
(!) DO NOT APPLY the above filter to your site without modifications!
It is very important to get this filter right, as otherwise it might result in missing statistics. Also, it does require some mentainance: e.g. when moving domain names, when adding new domain names (third-party hosted shopping carts for example), etc.. As always, keep a view with the raw data just in case.
Solution: Part 2 – Filter additional referrer spam
Some of the spammers will not be covered by the filter from Part 1. So we’ll need an additional filter for these guys. Here is the one I’m currently using:
(Updated 17 Aug 2015. Has aggressive seo-related filtering. Do not apply if you receive legitimate traffic from sites with “seo-” or “-seo” in the domain. Should be safe for 99.99% of sites.)
You need to construct this filter as shown in the screenshot bellow:
Make sure to filter on “Campaign Source”. If you spot a new spam domain that’s going through your filters you need to add a vertical line (“|”) and then the name of the domain. Escape dots by adding a backslash (“\”) before them. As you can see I prefer not to add the whole domain, but just the unique part of it, but this would vary between domains.
Bonus Solution – Retroactive Filter:
Since view level filters are not retroactive – they only start working on your data from the moment you set them up onwards, they won’t fix your past data. That’s why you’d want to use custom segments or table filters in order to clean up your past data from referral spam when preparing reports.
Here is what such a custom segment should look like:
The values are based one #1 the example from part 1 of the solution and part 2 of the solution, so you can copy the second regex directly. Make sure to (!) modify the regex in the first field to match your own hostnames (!), as described in part 1 of the solution in this very article.
You set it up once and save it. Then you can apply it to most reports in Google Analytics. Bonus bonus tip: create a shortcut in GA to a report with this segment applied in order to access it more quickly, if you need to do so on a regular basis.
If you are managing more than a couple Google Analytics accounts we would strongly recommend our newly launched fully automated solution to the problem. You can now use our Auto Spam Filters tool to eliminate & protect against referrer spam & other ghost traffic. It’s a set-and-forget, 1-click wonder that works across 100s of properties and views. The filters are frequently updated for continued protection.How to Fix Ghost Traffic / Spam Traffic in Google Analytics by Georgi Georgiev