If both hypothesis end up being false, we can segment out the real views. Luckily. For now.
There is a great concern among web marketing Analysts regarding (Google) Analytics spam issues. But one discussion seems to be avoided at all costs since it could actually ruin the whole business of web analytics:
How can it be done?
If you read some great articles you will notice there are several ways to do analytics spamming:
- Bad bots (programs running on spammers computers)
- Botnet (network of bots running on infected computers across the globe)
- Really bad spam bots (damn)
- Smart spam bots (the ones we hope for)
Religious mantra
Most of the analytics community belives we are attacked by Bad and Smart spam bots. The ones that can be filtered out because they are not real humans. And if you filter them out, you get only the humanoid (real visitors) data.
Scientific issue
The thing is we cannot be certain about that. When we have analyzed the data in past few months, the spam data looks much like real human data:
- Pageviews is in normal range
- Exit is in normal range
- Timing on page is in normal range
- Bounce rate may be 100% or 0%
Well damn. Seems like real users. Are they real users?
We cannot know. Because if spammers are using a botnet of infected computers, there are at least 2 ways they could use the real users clicks with an (evil) browser plugin:
1. Change the referrer in the first request to my website. You can try this with this (safe) plugin: [https://chrome.google.com/webs... ] - so actually if user has a spammer browser plugin that does this, it is all it takes. And installing such a browser plugin on porn pages is quite a simple task for nonexperienced users.
2. Actually redirect the first opening of my website through their server where they have automatic redirect (user does not see that intermediate site because such request is redirected in miliseconds). This can as well be done by browser plugin that just changes the links to the redirect link of spam sites.
The bounce rate may differentiate it. Real users bounce rate of 100% or 0% is not really expected. Yet we cannot be sure about the reason for this bounce rate. It may be just random real users requests that are being redirected once and afterwards nothing occurs from the browser plugin. This would create high bounce rate but again influence the real data analysis.
Hypothesis
That's why we have developed some hypothesis that require testing:
- Analytics data contains fake clicks of real users on botnet with spam referer in request header
- Analytics data contains real clicks of real users on botnet with spam referer in request header
What if true?
If both hypothesis end up being correct, we must develop methods to exclude fake clicks of real users. If that cannot be developed (and I cannot imagine currently how it could be), our analytics data may be useless from now on. Or at least until all spammers quite their jobs.
Because we cannot know which segment of our data is wrong, any analysis may be wrong. There may be real final goals achieved (buy a product) by fake referer. Therefore you will not know which part of your advertising money was spent properly.
Further down the road real users clicks may contain other wrong data (browser, dimensions,...) that will impact our analytics.
What if not true?
If both hypothesis end up being false, we can segment out the real views. Luckily. For now.