In the past 2 weeks we are witnessing a new wave of Google Analytics Spam – language spam and in this post I’ll outline what it is, how it happens and how you can protect your GA accounts from it (to some extent). We first started seeing it on Nov 8, just as the 2016 US presidential elections vote was winding down and it contained the following message:
Secret.ɢoogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!
The above text is shown in the dimension usually reserved for language information. Such information is sent automatically to GA by most browsers in the form of short abbreviations, such as “en”, “en-gb”, “en-us”, “es”, “fr”, etc.
It is also combined with referral spam, with multiple domains listed as source/medium, including abc.xyz, brateg.xyz, budilneg.xyz, begalka.xyz, bezlimitko.xyz, bukleteg.xyz, boltalko.xyz, biteg.xyz and others. So it is a two-vector attack trying to get the user’s attention to both the fake referrer domains and to the language report, probably because of it’s prominent placement on the Google Analytics “report homepage”:
(Update Dec 14, 2016) – The spam is now coming in with referrer “twitter.com” and “blackhatworld.com“, page title “Vitaly rules google” and language “Vitaly rules google ☆*:｡゜ﾟ･*ヽ(^ᴗ^)ﾉ*･゜ﾟ｡:*☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(ﾟДﾟ)ﾉʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO”
(Update Dec 7, 2016) – The spam is now with referrer “motherboard.vice.com“, full referrer and page title “motherboard.vice.com/read/this-pro-trump-russian-is-spamming-google-analytics” and language “o-o-8-o-o.com search shell is much better than google!“. Our spam blocking solutions – both manual and automatic continue to work and would have blocked it if you had them deployed.
(Update Dec 2, 2016) – Over time the spam mutated its referral source to legitimate sites such as “addons.mozilla.org“, “webmasters.stackexchange.com“, “thenextweb.com“, and lately “reddit.com” and is likely to mutate further. It also used “lifehacĸer . com” which mimicks the legitimate lifehacker.com site, but is controlled by the spammer. All our solutions continue to work and block it all.
(Update Nov 28, 2016) – The spam is now also coming in as “Secret.Google.com-Trump” (note that the site is secret.ɢoogle.com, not secret.google.com, these are two completely different domains).
How Big is secret.ɢoogle.com Spam in Google Analytics?
It appears it caught a lot of eyes, as the number of searches for “secret.ɢoogle.com”, “secret.google.com”, and “Secret.ɢoogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!” quickly surpassed both searches for “google analytics spam” and “google analytics referral spam” combined:
Newer spam such as “o-o-8-o-o.com search shell is much better than google!” and “Vitaly rules google ☆*:｡゜ﾟ･*ヽ(^ᴗ^)ﾉ*･゜ﾟ｡:*☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(ﾟДﾟ)ﾉʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO” also register significant search volumes.
The traffic the spammers artificially generate is also notable since it diverges from the usual referral spam we see – it has a kind of “average” bounce rate, registers over 2.5 pages/sessions and also a very long avg. session duration – over 20 minutes, as seen in the screenshot below:
The above is different than usual referral spam traffic which has very high % of new sessions and consequently very high bounce rate and pages/session and average duration near 1.00 and 0:00:00 respectively.
This particular type of language spam only registers pageviews on your homepage, so metrics for internal pages should not be affected.
Other than that, this wave of language spam traffic in GA masquerades as Firefox 49.0 on Linux, no flash version and industry average screen resolution and browser size.
(Update Nov 28, 2016): The browser is now set to “ɢoogle.com” in newer instances of the same ghost traffic.
Due to the message being related to the US elections happening on Nov 8, we expected that this kind of spam will be short-lived. Boy, were we wrong – it is only growing stronger as it appears to be catching the attention of many webmasters around the world. Like e-mail spam, spammers who insert fake traffic in Google Analytics also trive on attention.
How do Spammers Infect GA’s Language Report?
As far as we’ve investigated this type of spam is no different than other types we’ve seen in the past. They are generally two types of them – bots (computer software) that actually “visits” your site and mimics users browsing it, and bots that don’t visit your site, but send “hits” straight to the Google Analytics servers. The second process is explained in detail in our “Guide to Referrer Spam in Google Analytics” from a while ago or you can check our more recent info-graphic on the topic. If you are wondering why Google allows this and why it’s so hard to arrive at a long-term solution, you can read our “All Your Analytics Are Belong To Us” article where we go into detail about that.
If you are thinking that there is an easy way out of this – there isn’t, even though language spam is a bit easier to deal with than the classical referral spam.
How to Remove Google Analytics Language Spam?
First of all – once recorded by GA, you can’t change or edit the data, so there is no way for you to permanently erase this traffic from your reports (this sucks, I know, but that’s how it is). Then there are two things you can do – to block spam from coming into your reports and to filter out the spam while reporting using advanced segments. The first is a permanent change to your Google Analytics views and only works from the moment you apply it on. The second is a flexible, retroactive solution, however you need to apply the advanced segment to your reports each time.
When it comes to blocking future spam, there are two ways to approach it. If you are have just a couple of websites, then the manual solution outlined below will likely be enough to keep your Google Analytics stats free of language spam for a good period of time. It won’t help much with referral spam, though.
If, however, you manage multiple sites with many different views for yourself or for your clients, then a more scalable solution is appropriate. The Auto Spam Filters anti-spam tool we offer is just such an option as it offers you a fully automated set-and-forget solution that blocks referrer spam, language spam, events spam, etc. from tens, hundreds or even thousands of Google Analytics views. Our anti Google Analytics spam tool works very similar to an anti-virus software – it will help you block most spam quickly and at scale. You can sign up for a 14-day free trial of the tool for 3 Google Analytics properties (no credit card required).
If you are going to use an automated tool to block language spam like “secret.ɢoogle.com” language spam, then skip straight to Part 2 of the solution below!
Part 1 – Block Spam with a View-level Filter
Setting up a view-level filters is fairly simple, but it should be noted that this is a permanent change going forward, so do be careful when using it, especially if you have little prior experience with view filters. The filter I propose will filter out any traffic (hits) where the language dimension contains 15 or more symbols. Since most legitimate language settings sent by browsers are 5-6 symbols and rarely is there traffic with 8-9 symbols in this field, it should only filter out language spam.
In addition to that, there are symbols which are invalid for use in the language field, but which can be used to construct a domain name (or what looks like it, such as “secret google com”, “secret,google,com”, “secret!google!com”), so we can exclude those as well.
The resulting regular expression we’ll use looks like this:
You need to construct the “Exclude Language Spam” filter as shown in the screenshot bellow:
Make sure to filter to the “Language Settings” dimension. You need “Edit” access at the “Account” level in Google Analytics in order to set up new filters, so make sure you have that, or you won’t even see the setup.
You can use the “Verify Filter” option to see how it would affect data from the last few days.
Part 2 – Filter Out Historical Spam Via an Advanced Segment
View level filters are not retroactive – they only start working on your data from the moment you set them up onwards, so they won’t help with your historical data. To do that it’s best to use custom segments in order to clean up your past data from language spam when preparing reports.
Here is a custom segment to filter out secret.ɢoogle.com spam as well as any future Google Analytics language spam:
You set it up once and save it. Then you can apply it to most reports in Google Analytics.
Bonus tip: create a shortcut in GA to a report with this segment applied in order to access it more quickly, if you need to do so on a regular basis.
Is this going to be both the first AND the last type of language spam in GA that we’ll see is a bit too early to say, given that we’re in the midst of the current wave. However, by implementing the advice above or by using our automated tool you will be protected from most language spam to come, not just from “secret.google.com…” spam.
New post, continuing the topic: Future-Proofing Against Google Analytics SpamLanguage Spam - The Latest Google Analytics Spam by Georgi Georgiev