In the past 2 weeks we are witnessing a new wave of Google Analytics Spam – language spam and in this post I’ll outline what it is, how it happens and how you can protect your GA accounts from it (to some extent). We first started seeing it on Nov 8, just as the 2016 US presidential elections vote was winding down and it contained the following message:
Secret.ɢoogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!
The above text is shown in the dimension usually reserved for language information. Such information is sent automatically to GA by most browsers in the form of short abbreviations, such as “en”, “en-gb”, “en-us”, “es”, “fr”, etc.
It is also combined with referral spam, with multiple domains listed as source/medium, including abc.xyz, brateg.xyz, budilneg.xyz, begalka.xyz, bezlimitko.xyz, bukleteg.xyz, boltalko.xyz, biteg.xyz and others. So it is a two-vector attack trying to get the user’s attention to both the fake referrer domains and to the language report, probably because of it’s prominent placement on the Google Analytics “report homepage”:
(Update Dec 14, 2016) – The spam is now coming in with referrer “twitter.com” and “blackhatworld.com“, page title “Vitaly rules google” and language “Vitaly rules google ☆*:。゜゚・*ヽ(^ᴗ^)ノ*・゜゚。:*☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(゚Д゚)ノʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO”
(Update Dec 7, 2016) – The spam is now with referrer “motherboard.vice.com“, full referrer and page title “motherboard.vice.com/read/this-pro-trump-russian-is-spamming-google-analytics” and language “o-o-8-o-o.com search shell is much better than google!“. Our spam blocking solutions – both manual and automatic continue to work and would have blocked it if you had them deployed.
(Update Dec 2, 2016) – Over time the spam mutated its referral source to legitimate sites such as “addons.mozilla.org“, “webmasters.stackexchange.com“, “thenextweb.com“, and lately “reddit.com” and is likely to mutate further. It also used “lifehacĸer . com” which mimicks the legitimate lifehacker.com site, but is controlled by the spammer. All our solutions continue to work and block it all.
(Update Nov 28, 2016) – The spam is now also coming in as “Secret.Google.com-Trump” (note that the site is secret.ɢoogle.com, not secret.google.com, these are two completely different domains).
How Big is secret.ɢoogle.com Spam in Google Analytics?
It appears it caught a lot of eyes, as the number of searches for “secret.ɢoogle.com”, “secret.google.com”, and “Secret.ɢoogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!” quickly surpassed both searches for “google analytics spam” and “google analytics referral spam” combined:
Newer spam such as “o-o-8-o-o.com search shell is much better than google!” and “Vitaly rules google ☆*:。゜゚・*ヽ(^ᴗ^)ノ*・゜゚。:*☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(゚Д゚)ノʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO” also register significant search volumes.
The traffic the spammers artificially generate is also notable since it diverges from the usual referral spam we see – it has a kind of “average” bounce rate, registers over 2.5 pages/sessions and also a very long avg. session duration – over 20 minutes, as seen in the screenshot below:
The above is different than usual referral spam traffic which has very high % of new sessions and consequently very high bounce rate and pages/session and average duration near 1.00 and 0:00:00 respectively.
This particular type of language spam only registers pageviews on your homepage, so metrics for internal pages should not be affected.
Other than that, this wave of language spam traffic in GA masquerades as Firefox 49.0 on Linux, no flash version and industry average screen resolution and browser size.
(Update Nov 28, 2016): The browser is now set to “ɢoogle.com” in newer instances of the same ghost traffic.
Due to the message being related to the US elections happening on Nov 8, we expected that this kind of spam will be short-lived. Boy, were we wrong – it is only growing stronger as it appears to be catching the attention of many webmasters around the world. Like e-mail spam, spammers who insert fake traffic in Google Analytics also trive on attention.
How do Spammers Infect GA’s Language Report?
As far as we’ve investigated this type of spam is no different than other types we’ve seen in the past. They are generally two types of them – bots (computer software) that actually “visits” your site and mimics users browsing it, and bots that don’t visit your site, but send “hits” straight to the Google Analytics servers. The second process is explained in detail in our “Guide to Referrer Spam in Google Analytics” from a while ago or you can check our more recent info-graphic on the topic. If you are wondering why Google allows this and why it’s so hard to arrive at a long-term solution, you can read our “All Your Analytics Are Belong To Us” article where we go into detail about that.
If you are thinking that there is an easy way out of this – there isn’t, even though language spam is a bit easier to deal with than the classical referral spam.
How to Remove Google Analytics Language Spam?
First of all – once recorded by GA, you can’t change or edit the data, so there is no way for you to permanently erase this traffic from your reports (this sucks, I know, but that’s how it is). Then there are two things you can do – to block spam from coming into your reports and to filter out the spam while reporting using advanced segments. The first is a permanent change to your Google Analytics views and only works from the moment you apply it on. The second is a flexible, retroactive solution, however you need to apply the advanced segment to your reports each time.
When it comes to blocking future spam, there are two ways to approach it. If you are have just a couple of websites, then the manual solution outlined below will likely be enough to keep your Google Analytics stats free of language spam for a good period of time. It won’t help much with referral spam, though.
If, however, you manage multiple sites with many different views for yourself or for your clients, then a more scalable solution is appropriate. The Auto Spam Filters anti-spam tool we offer is just such an option as it offers you a fully automated set-and-forget solution that blocks referrer spam, language spam, events spam, etc. from tens, hundreds or even thousands of Google Analytics views. Our anti Google Analytics spam tool works very similar to an anti-virus software – it will help you block most spam quickly and at scale. You can sign up for a 14-day free trial of the tool for 3 Google Analytics properties (no credit card required).
If you are going to use an automated tool to block language spam like “secret.ɢoogle.com” language spam, then skip straight to Part 2 of the solution below!
Part 1 – Block Spam with a View-level Filter
Setting up a view-level filters is fairly simple, but it should be noted that this is a permanent change going forward, so do be careful when using it, especially if you have little prior experience with view filters. The filter I propose will filter out any traffic (hits) where the language dimension contains 15 or more symbols. Since most legitimate language settings sent by browsers are 5-6 symbols and rarely is there traffic with 8-9 symbols in this field, it should only filter out language spam.
In addition to that, there are symbols which are invalid for use in the language field, but which can be used to construct a domain name (or what looks like it, such as “secret google com”, “secret,google,com”, “secret!google!com”), so we can exclude those as well.
The resulting regular expression we’ll use looks like this:
.{15,}|\s[^\s]*\s|\.|,|\!|\/
You need to construct the “Exclude Language Spam” filter as shown in the screenshot bellow:
Make sure to filter to the “Language Settings” dimension. You need “Edit” access at the “Account” level in Google Analytics in order to set up new filters, so make sure you have that, or you won’t even see the setup.
You can use the “Verify Filter” option to see how it would affect data from the last few days.
Part 2 – Filter Out Historical Spam Via an Advanced Segment
View level filters are not retroactive – they only start working on your data from the moment you set them up onwards, so they won’t help with your historical data. To do that it’s best to use custom segments in order to clean up your past data from language spam when preparing reports.
Here is a custom segment to filter out secret.ɢoogle.com spam as well as any future Google Analytics language spam:
You set it up once and save it. Then you can apply it to most reports in Google Analytics.
Bonus tip: create a shortcut in GA to a report with this segment applied in order to access it more quickly, if you need to do so on a regular basis.
Is this going to be both the first AND the last type of language spam in GA that we’ll see is a bit too early to say, given that we’re in the midst of the current wave. However, by implementing the advice above or by using our automated tool you will be protected from most language spam to come, not just from “secret.google.com…” spam.
New post, continuing the topic: Future-Proofing Against Google Analytics Spam
Thanks, this was the most useful post I could find, your approach by limiting the number of characters is the most elegant.
Georgi,
Awesome post man. Great research. Love the use of regex expressions to overcome it. I’m starting up a new site myself and saw this recently and was left with a serious WTF face.
Appreciate it.
-Alex
Nice tip; easy and smooth
Thx
I’m getting one that says it’s reddit, turns out it’s the same type of spam. This one is coming into my social referral and the landing page is Google(dot)org. I don’t know how to filter this one out.
I’d like to see a solution to the reddit issue too. Have same on my site. Seems alot of this is just starting, or I’ve never noticed
So glad I found this, was looking at my GA thinking what the bloody hell is that crap? All fixed now. Cheers!
Terrific post mate, helped a lot. Thanks!
Giorgi,
really appreciate this. Very helpful.
GREAT !!! TNKS
Thanks for taking the time to put this together.
This was amazing information! I’m not a web person or a digital person and yet I find myself in charge of a website and am teaching myself how setup and run Google Analytics. Once I got GA setup I immediately knew something was wrong with the numbers and I went ‘a-googling’ to find help. Your blog was clear, concise and the fix was so easy. Thank you so much!!
Thank you for this post! Really appreciate it!!!
Thank you for this post
Thanks!
Easy read and a simple solution to apply.
Screw all that, this is something Google needs to fix. Not my problem. If a billion dollar company can’t handle this I’ll just switch to another analytics provider.
Great post. Easy to follow and step-by-step implementation. Thanks!
Great help Georgi and easy for a total non-techie like me to follow and implement, thanks.
Thank you!
I have an SEO ping site (easypings.com) and have started to receive over 500 of these spam sessions per day. I wonder if alternatives to analytics are able to filter these server side.
As far as I know no *real* alternative to GA exists, and also as far as I’m aware most analytics services are in their core based on the same technological approach, making them equally vulnerable, e.g. search for “piwik referer spam”. Note that they are targeted less since they have less users.
You might want to look at my “All Your Analytics Are Belong To Us” post from almost 2 years ago to better understand the issue.
There are multiple black-listing solutions which will have a significant issue with this new type of spam, as I describe in my next post.
So which bot is this? The one that actually visited all of my sites or is this the one that sent hits to the GA servers?
As far as we’ve seen this would be ghost spam – no hits on your server, sending hits directly to GA.
Would be useful if Google just blocks them. One would think they noticed. Just saying…
They most certainly know. It’s just hard to impossible for them to block such spam without false positives. See my reply to “shubeydo” above – the link to “All Your Analytics Are Belong To Us” for details.
Thank you, so much. It worked perfectly and even if it hadn’t it was helpful to learn that I wasn’t alone in suddenly getting this kind of attack.
Thank you very, very much.
Very useful.. In my case the visits are from Russia , So I used filters with country Russia and Language term Trump. Am I doing correct?
Not generally recommended. First of all, many sites have legitimate users from Russia, so you’ll be filtering out actual visits, not just spam.
Second, the spammer can always spoof the geographical region to US, UK or whatever he pleases.
Third, filtering by language containing “Trump” will only protect you from this language spam, but will not protect you from future language spam as the filter I recommend and that we use in our Auto Spam Filters tool.
Thank you Georgi.
By the way can you explain me this part of the regex please |\s[^s]*\s| ?
I understand the beginning and the end ( numbers and special characters) but what about the s ?
In this particular case it’s matching a space, followed by any number of non-spaces followed by a space. It would match “secret google com” for example, but will not match valid language codes like en_us .
I think you need another backslash just after the carrot. Otherwise, its matching a space, followed by any number of s’s, followed by a space, right? (In other words, don’t you have to escape the second s? Thanks for the post.
You are correct, I’ve updated the regex. Thanks!
Thank you so much! The most useful guide!
Hi Georgi,
We are receiving this as well but I have a more in-depth question… Based on your screen grabs (assuming this is filtered for just language), for the “referral spam”, you have a low enough amount of sessions, but your bounce rate is 44.89% which isn’t terrible, pages per session is 2.61, and average session duration is over 20 minutes. This to me does not seem like bot traffic. In my particular instance for this language, my bounce rate is less than 1%, almost 4 pages per session, and over 4 minutes for average session duration, and I’m getting form fill conversions off some of these visits.
This cannot be simply chalked up as bot spam, this looks like legitimate traffic from referral sites that could be infected with some script to alter the language when the Google Analytics code is fired. So if this is indeed legit traffic, why would you want to filter it out?
Hi Jonathon,
We look at much more than the source/medium field and the engagement metrics when evaluating traffic as bot or ghost traffic. In this case we’re 99.999% certain it’s fake traffic and not legitimate traffic with a spoofed language settings http field.
Faking bounce rate and other metrics is easy for bots and ghost traffic generators. As I note in the post it’s been rare so far to see such engagement metrics, but it is still fake traffic. Faking form submissions is possible, but unlikely for this kind of traffic. I don’t think we’ve seen form submissions from such traffic so feel free to share a screenshot or two that demonstrate it by using our contact form.
Cheers,
Georgi
This worked like a charm!
Thanks!
Thank you so much for this pos, a lot of my clients have this issue so it’s been very helpful.
I’m looking to set up the view filter but this regular expression filters out some relevant language as well. Specifically for one of my clients it filters out “es-es_tradnl” (traditional Spanish I guess, looking at where it’s coming from). So I’m a bit hesitant to use it, especially for those who get Spanish traffic as well. I did a quick search and found there is even a “ca-ES-valencia”, which is even longer. Also some other languages may have slight longer ones. So I’d propose to use a higher number (20 or something) or just not use the number.. would that still work? e.g. if we leave just this part: \s[^s]*\s|\.|,|\!|\/
Thanks Rianne, very interesting! Seems like that’s what’s called a “culture code” and I’m not sure why browsers would put it in the language field, but it is definitely an issue, even if it’s just limited to users in Spain. We’ve updated the article to a 15-character limit.
Using just the last part of the regex as you write will also work in this case, but might fail in more evasive variants of future language spam.
Thanks for this post Georgi!
Hi there, this seems very helpful! I did step one, which I think I did successfully, but I cannot find the “verify filter” you suggest. And for the second part you recommended, there is no “language” option under my demographics. Only age and gender. Basically I cannot find anything that matches the screenshot for part two. Am I missing something? If you could direct me to the right place, I would be so grateful. Thanks!
This doesn’t sound right, language is a default dimension present in every Google Analytics setup, you should see it. You can also try to change the custom segment by selecting Conditions and then searching for “language” in the filter drop down.
Really? Google.com is not the same as google.com? Since when did a DOMAIN cared for anything other than 7 bit ascii (meaning no difference between upper and lower case)? Now you’re going to tell me that J.ri*****@whatever.com is not the same as j.ri*****@WHATEVER.CoM … Come on, please be factually correct.
I am factually correct as they are in fact different. Feel free to check the whois for “google.com” as well as for “ɢoogle.com” (which is in fact XN–OOGLE-WMC.COM). You can search for IDN domain names if you want to bring your knowledge of the DNS system up to date.
If you want to, you can always also just visit them – the first one is the official Google homepage and the other one – well, it’s whatever the spammer wants it to be at that point in time.
Awesome! Thanks for your help 🙂
Quick question- I noticed in the filter you created you used “15” as the cutoff for exclusion and in the advanced segment it looks like you used “12”. Is that accurate? Why the discrepancies?
We’ve updated the segment screenshot to reflect the updated regex.
So lovely to receive a useful advice from a fellow Bulgarian! Thanks Georgi, I’m glad I came upon your article for some help. A just created a new filter and hopefully, I’ll receive better stats for the future.
You rock. Thank you. I’m compiling a list of spam filter tips like this, and you’ll be on it for sure.
Glad to be of help, Randy!
Hi Georgi,
I just want to add my voice to chorus of thanks for posting this.
I really appreciate the effort and detailed explanation behind the fix.
Chris
Thanks Chris, it’s appreciated!
The filter worked PERFECTLY on one of my sites. However, the second site I added this filter to, when I clicked to Verify the filter I received the following prompt: “This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.”
Double-checked (and checked yet again) and every field is filled in as you instructed. Even copied and pasted the sample data again to make sure I didn’t miss a character … and I STILL get the prompt.
Just so you know, I am NOT a newbie … and as a side note: I am disappointed with Google allowing this and still “forcing” webmasters to use their analytics or “suffer the consequences”.
Hi Trish,
There are two reasons why you might be getting this when verifying the filter:
1.) You already have a filter that blocks this kind of spam, or the site in question simply didn’t receive spam of the type that the filter blocks
2.) You are running the verification after the filter was saved (it only works properly before the filter is saved)
Either way, I don’t think you need to worry about that if your configuration is correct and you followed the steps precisely: just have the filter in place and enjoy language-spam free GA reports.
Hi Georgi,
Thanks for the tutorial. However, if I try to apply the filter as described and click on ‘verify this filter’ I get ‘This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.’ Going to the reporting section it didn’t seem to have affected anything.
What am I doing wrong?
Hi Art,
I can’t say precisely, since I don’t have access to the particular Google Analytics property. Refer to my reply to Trish above for possible reasons for seeing this message.
Thank you so much for the explanation. This has made me crazy! I want it off my list. I’m not sure I understand the explanation, but I’ll have a look. Thanks again.
Thank you so much! It was a bit of a challenge to follow the instructions since I haven’t done much advanced filtering before, and haven’t used segments at all in the past, but the results are exactly what I hoped for.
Wow! I had WAY too much spam – from Reddit, Twitter, motherboard.vice.com – you name it! I followed both Part I and Part II and it did EXACTLY what I needed (I had to google how to apply the segment but found that quite easily). THANK YOU!!
Great article. Surely Google will take issue with this soon! It seems to affect all client sites and is a pain.
Google has known about referrer spam for years and has so far done little to nothing. Thus, I’m not keeping my hopes high, especially after the latest wave of language spam, which should have been trivial to block GA-wise.
Hi Georgi,
I did the first part. At the second part “Filter Out Historical Spam Via an Advanced Segment” I clicked “Add Segment” in the “Reporting” tab & (at first) it worked like a charm & it all looked tidy again.
But then I logged out (to do the same on my other website) & when I logged back in, all the spam rows where all back again.
So, I added the segment again (following your steps). First they’re gone (& it’s all tidy again), but they’re right back as soon as I log out & log back in. I did that 3 times … now I’m writing to you because that can’t be right.
Anything I’m missing here?
Thank so much for your post.
Hi Kerstin,
Yes, you appear to be missing some basic understanding of how segments work in Google Analytics. You need to apply the segment each time you log in to Google Analytics (if you log out for whatever reason) or when you navigate away. Segments are not applied for you automatically as this doesn’t align with their general purpose. I’d advise you search for “Advanced Segments Google Analytics” to gain a better understanding of the matter.
Georgi
Interesting article….Now i got great idea to block this spammer from my website. Thanks for sharing it with us!!!
MANY THANKS it was exactly what I needed…
Sharing it was a very good idea, thanks again !
Thanks this is a huge help. Just out of curiosity what does that expression mean?
Fantastically useful, worked flawlessly, very easy to follow. Thank you very much.
Thank you for this information! It worked perfectly. The language spam was really creepin me out.
Hi,
Have you or anyone else had this problem of seeing 0 clicks, cost and CPC in the Adwords / Campaigns reports in Google Analytics when using the language filter segment? I know we should not actually apply this segment in Adwords, because, as far as we know, the Adwords reports are not affected by the spam referrals, but just out of curiosity, what could be the reason for “tying” these Adwords specific metrics to the Language dimension and not display them ?
Yes, this would happen with *any* advanced segment you apply to AdWords reports.
Thank you for this! I’m sick and tired of seeing a pro-Trump message (ew) every time I view my Google Analytics dashboard.