Trackback spam and web stats: Using Google Analytics to figure it out.
I get so tired of looking at my web statistics in Awstats and then going over to Google Analytics to see the actual traffic to my blog. Why is there such a big difference? One of the reasons is trackback spam, and it affects a lot of WordPress sites. This post isn't going to show you how to eliminate trackback spam, but it will show you how to live with it, and how to figure out your actual traffic (or at least eliminate the trackback.php visits).
If you don't know, Awstats is a web statistics program that gets its data from the apache access log. That means that it tracks all of the traffic to your site. This includes trackback spammers, normal visitors, bots, everything. So when looking at my Awstats statistics I see really inflated numbers (ones that I wish were real) that don't account for the actual traffic I have. I want the program to show real people who are engaged in the content, not Joe Trackbackspammer (which is most likely a bot) posting trackbacks about how to find some kind of drug, or good time, or casino, etc, etc.
Google Analytics, on the other hand, is a javascript based web stats tool. It's free from the folks at Google. It only tracks pages that actually have the Analytics javascript on them. In my case, the culprit, trackback.php does not have the code, so Analytics doesn't count it. I can instantly see a better view of actual traffic to my site. Using Analytics in this way would also block out legitimate trackbacks, but I get so few (like .01%) that it doesn't bother me.
One thing that does bother me about both stats programs is that you can't tie who is visiting with what they visited (unless I'm missing something and haven't found it yet). However, this is really easy to do by looking in the raw log file. You'll need to obtain the raw log file from your host and look at it with text editor. I suggest something like VIM since these have a tendency to be huge files (believe me, it kills Dreamweaver to open these kind of files). Just take a look in your Awstats log to see which ip addresses you want to look up in the log file. Look for the ones that have equal numbers of Pages and Hits in high numbers. This usually means that the computer associated with the suspect ip address is hitting a single page over and over again. Once you have an ip address use the find tools in VIM to find that ip address. If there is a whole slew of trackback.php visits from one ip on one or several posts then they are spamming you.
So how do you block them? Well the easiest and most drastic thing to do is to block their ip address from viewing your site. This is how I've gotten rid of some annoying spammers. The problem is that some of them use random ip addresses so you can't block them using this method. Also, you may inadvertently block a lot of users if the computer using this ip address is behind a router or switch that controls a network. If those problems didn't make you flinch then simply add this code to your .htaccess file and say sayonara to those spammers:
#Deny IP adresses
<limit GET>
order allow,deny
deny from 000.000.000.000
allow from all
</limit>
Replace 000.000.000.000 with your suspect ipaddress, of course.
You should also install the Akismet plugin, which comes with WordPress. All you have to do is enable it in the Plugins section of the Admin interface. This will eliminate almost all of your spam, including comment spam, by placing it in a spam folder of sorts. The spam is deleted after a period of time automatically and it has worked like a charm so far for me. The only downside to this method is that the spammers still get to your server and waste resources, whereas the method above completely blocks them.