Since my last posts on how to eliminate Ghost Referrer Spam in Google Analytics & Google Analytics data integrity attacks two things have happened:
1. The number of people complaining about spam referrers / spam events / spam pageviews in webmaster forums and Google Groups have increased exponentially
2. The Google Analytics team noted on their Google Plus page that they know about the issue and that they will issue guidance on how to handle it
The solution they hinted to referred to filtering on the side of webmasters/website owners. This is most unsatisfactory as there should be a way to handle this more intelligently than making hundreds of thousands of GA users, many of whom don’t even know there is such a thing as “view filter”, to manually apply such filters to ALL their views.
Since, however, this is the only way that we as Google Analytics Pros and as webmasters and site owners can handle this I decided to put this all in one place and provide a solution for all ghost/spam traffic issues in one. If you are familiar with the issue, skip to the solution!
Again, what is the issue in detail?
The issue with referral spam in general is that anybody can send fake Google Analytics hits to your GA account and do so easily, effortlessly and with great efficiency. No need to access your server and all. See more details in the two posts linked in the beginning of this article. This main issue manifests itself in several different ways (so far):
1. Ghost / Spam Referrers in your Referral Traffic report
The most notorious of these being darodar and semalt, with many others following their lead. This is not real traffic, just spam to make you visit their crappy sites, offering all kinds of webmaster-oriented services plus the occasional virus/mallicious infection (that’s why you shouldn’t visit them!). This is fake traffic, only visible in Google Analytics. This traffic is not really hitting your servers. An example (click for full size):
2. Ghost / Spam Pageviews
Instead of spamming the referrer report, these spammers inject fake pageviews, containing URLs that they want you to visit. Of course, no such URLs exist on your website, these are fake URLs crafted to entice you to visit their sites. Mainly adult related sites for now. An example (click for full size):
3. Ghost / Spam Events
There has been 1-2 of these for now. They inject their spam in the event category and thus appear in your events reports. Of course, no events actually take place on your site, these are entirely fake events, only visible in Google Analytics. An example:
How to FIX 99% of Ghost / Spam Traffic issues
This solution eliminate fake traffic has two parts – the first to filter out 98% of all referrer spam, ghost pageviews and ghost events. The second: an additional filter for some of the referrer spam that will not be covered by the first filter. So with these two filters you’ll be able to filter 99%-100% of all fake traffic hitting your Google Analytics. For how long will it cover those 99-100% I can’t guarantee though… What I can guarantee you is that this is the best thing you can do for now and I don’t see how any advice given by the GA team would be better.
NEW: We’ve just launched a fully automated solution to the problem. You can now use our Auto Spam Filters tool to eliminate & protect against referrer spam & other ghost traffic. It’s a set-and-forget, 1-click wonder that works across 100s of properties and views. The filters are frequently updated for continued protection.
Solution: Part 1 – Filter hostnames
This filter INCLUDES ONLY traffic with the hostname field set to a predefined set of values (yours). Yes, it can be spoofed, but the less sophisticated referrer spammers that just leave this field to (not set) or who set a random domain name here will be filtered out. And this is most of spam/ghost/fake traffic at the moment and will likely continue to be the larger part of GA spam since investing in mass-scale cralwer software & infrastructure is not that chep.
Here is how to set it up:
1.) Go to your Hostnames report and select a date range of a month or more. This will show you the hostnames you need to consider for inclusion.
From the screenshot above it is obvious that I only need to include www.analytics-toolkit.com for this particular view. In your particular case you might need to include more than one hostname! I would not recommend including translate.googleusercontent.com in most cases even if that means losing some stats, as more adept spammers will just use this hostname to bypass the filter, without the need for a cralwer or a third-party UA-ID to hostname database.
2.) Construct and apply the filter.
You would want to use some basic regex here. For example, if I want to include all traffic to analytics-toolkit.com and a third-party shopping cart, hosted on shopify.com for example, here is what that would look like:
analytics-toolkit\.com|shopify\.com
(!) DO NOT APPLY the above filter to your site without modifications!
It is very important to get this filter right, as otherwise it might result in missing statistics. Also, it does require some mentainance: e.g. when moving domain names, when adding new domain names (third-party hosted shopping carts for example), etc.. As always, keep a view with the raw data just in case.
Solution: Part 2 – Filter additional referrer spam
Some of the spammers will not be covered by the filter from Part 1. So we’ll need an additional filter for these guys. Here is the one I’m currently using:
(Updated 17 Aug 2015. Has aggressive seo-related filtering. Do not apply if you receive legitimate traffic from sites with “seo-” or “-seo” in the domain. Should be safe for 99.99% of sites.)
darodar\.|semalt\.|buttons-for.*?website|blackhatworth|ilovevitaly|prodvigator|cenokos\.|ranksonic\.|adcash\.|(free|share|social).*?buttons?\.|hulfingtonpost\.|free.*traffic|buy-cheap-online|-seo|seo-|video(s)?-for|amezon
You need to construct this filter as shown in the screenshot bellow:
Make sure to filter on “Campaign Source”. If you spot a new spam domain that’s going through your filters you need to add a vertical line (“|”) and then the name of the domain. Escape dots by adding a backslash (“\”) before them. As you can see I prefer not to add the whole domain, but just the unique part of it, but this would vary between domains.
Bonus Solution – Retroactive Filter:
Since view level filters are not retroactive – they only start working on your data from the moment you set them up onwards, they won’t fix your past data. That’s why you’d want to use custom segments or table filters in order to clean up your past data from referral spam when preparing reports.
Here is what such a custom segment should look like:
The values are based one #1 the example from part 1 of the solution and part 2 of the solution, so you can copy the second regex directly. Make sure to (!) modify the regex in the first field to match your own hostnames (!), as described in part 1 of the solution in this very article.
You set it up once and save it. Then you can apply it to most reports in Google Analytics. Bonus bonus tip: create a shortcut in GA to a report with this segment applied in order to access it more quickly, if you need to do so on a regular basis.
If you are managing more than a couple Google Analytics accounts we would strongly recommend our newly launched fully automated solution to the problem. You can now use our Auto Spam Filters tool to eliminate & protect against referrer spam & other ghost traffic. It’s a set-and-forget, 1-click wonder that works across 100s of properties and views. The filters are frequently updated for continued protection.
Thanks for this and especially the last part. I was trying to find a way to keep that and did not think about a custom dashboard shortcut. I hope that is what can be done.
I can’t believe this is even happening to Google. I just started getting all this junk in the last month. As a GA novice I would think that Google would have some database they maintain and option to just eliminate this automatically with a checkbox like the bot exclusion. I mean who would want this corrupting their data? Unless someone wants to have fake numbers to show a client it’s basically ruining it for me. I don’t have time to manage 100+ properties. If you are correct and I can just use a bookmark to do this once and then add it to my other properties at least I can continue to use GA.
Kind of a shame. Google owns the fastest computer ever built and should have no problem figuring this out.
Hi Dave,
You’d be happy to know that we’ve just launched a solution that allows you to do just that: push a button and get spam protection on all accounts, properties & views you work on: https://www.analytics-toolkit.com/auto-spam-filters/
Cheers,
Georgi
Thanks for this post, Georgi. It’s such a relief to find someone who can explain what’s going on so in-depth. I’ve read through this carefully and set up according to your instructions.
I’m looking now at the referral traffic for one site with the Advanced Segment in place, and am seeing that it’s not working the way I though it would. For example, 100dollars-seo is in the Botnet Exclusion Segment as 100dollars-seo\. . The numbers in my referral traffic report are the same for both the “All Sessions and “Botnet Exclusion Segment.” Then there’s success-seo, which is not in the Advanced Segment, and the number of total sessions for this site are different with and without the Botnet Exclusion Segment turned on.
Can you explain this?
Hi Gaia,
Is the segment Sessions -> “Exclude” -> RegEx -> my regex? If so, then it would be strange that the numbers would be the same between such a segment and all sessions, but I won’t be able to say why unless I have access to the account. The fact that you are seeing differences for other spam which is not covered by my regex hints at sampling issues, or at least that’s the only thing that comes to mind, barring obvious errors as comparing two different periods, etc.
Georgi