All your forms are belong to bots.
by schwim on Aug.05, 2007, under Spam
If you own a site that has a contact form, forum or other method of communication or contact and your site has been in existence longer than a month, you’ve probably witnessed the effects of a bot. Whether it’s bogus referrers in your access logs, Viagra spam in your forum, or sales pitches being sent through your contact form, it’s a pain to the webmaster. The chances are slim that you’ll ever find a way to completely rid yourself of it, but I’m going to give you the top three fastest ways to drop the number of bots to knock on your door. These Will Not get rid of the problem, but will simply lessen the load. The lower you want the number of bots to be, the harder you have to work. This is intended for quick returns and to get you started on your way to a cleaner and less cluttered website.1) It’s obvious, but the number one way to keep bots from filling up your forum or contact forms with offer for sexual enhancement or fat reducing pills is to NEVER have an unprotected form. If it’s a forum, require registration. If it’s a contact form, have a CAPTCHA. DO NOT have a form with no protection! Once it’s found, it will be abused until your form is useless due to the number of bogus messages being sent.
2) Stop the dumbest of the dumb before they have a chance to strike. One popular way to do this is to block one of the popular methods of contact: libwww-perl, or LWP. This is a perl module that will allow their script to contact the site with no human interaction. This allows them to post comments or forum topics(or register) all without anyone actually having to type anything. Now, before you get too excited, we’re not block all people using libwww. We’re blocking the people using libwww who are too stupid to spoof their user-agent string. Unfortunately, the numbers are lower than we’d like, but remember that we’re going for quick and dirty.
On an apache server, you simply use .htaccess to block the agent. .htaccess is a file that sits in the root directory of your site and tells the server what to do on each page generation. Think of it as a set of rules that the server adheres to each time someone visits. In our case, we’re going to tell the server to block any requests made by anything with a user-agent that contains the libwww signature:
In .htaccess, add:
| SetEnvIfNoCase User-Agent “^libwww-perl*” BadBotsorder deny,allow deny from env=BadBots |
Like I said, this will only block the most moronic of the morons.
3) Blocking by “keyword”: If you monitor your referrers, you may find thousands of sites referring to yours with URL’s like: http://viagraisthecoolest.info/cialis/tramadol/letmesellyou/illegaldrugs.htm . You’ll find that this page is simply a sales page. Other than lost server resources, the damage isn’t that great, unless you share your referrers with the visitors of your site. Many CMS’ have blocks that share the latest X number of referrers, and this is what the bots are counting on. Here’s how you block the most common of the referrers:
Again, in .htaccess:
longer than a month, you’ve probably witnessed the effects of a bot. Whether it’s bogus referrers in your access logs, Viagra spam in your forum, or sales pitches being sent through your contact form, it’s a pain to the webmaster. The chances are slim that you’ll ever find a way to completely rid yourself of it, but I’m going to give you the top three fastest ways to drop the number of bots to knock on your door. These Will Not get rid of the problem, but will simply lessen the load. The lower you want the number of bots to be, the harder you have to work. This is intended for quick returns and to get you started on your way to a cleaner and less cluttered website.1) It’s obvious, but the number one way to keep bots from filling up your forum or contact forms with offer for sexual enhancement or fat reducing pills is to NEVER have an unprotected form. If it’s a forum, require registration. If it’s a contact form, have a CAPTCHA. DO NOT have a form with no protection! Once it’s found, it will be abused until your form is useless due to the number of bogus messages being sent.
2) Stop the dumbest of the dumb before they have a chance to strike. One popular way to do this is to block one of the popular methods of contact: libwww-perl, or LWP. This is a perl module that will allow their script to contact the site with no human interaction. This allows them to post comments or forum topics(or register) all without anyone actually having to type anything. Now, before you get too excited, we’re not block all people using libwww. We’re blocking the people using libwww who are too stupid to spoof their user-agent string. Unfortunately, the numbers are lower than we’d like, but remember that we’re going for quick and dirty.
On an apache server, you simply use .htaccess to block the agent. .htaccess is a file that sits in the root directory of your site and tells the server what to do on each page generation. Think of it as a set of rules that the server adheres to each time someone visits. In our case, we’re going to tell the server to block any requests made by anything with a user-agent that contains the libwww signature:
In .htaccess, add:
| SetEnvIfNoCase Referer “*viagra*” BadReferrer SetEnvIfNoCase Referer “*poker*” BadReferrer SetEnvIfNoCase Referer “*medicine*” BadReferrer SetEnvIfNoCase Referer “*pills*” BadReferrer SetEnvIfNoCase Referer “*diet*” BadReferrer SetEnvIfNoCase Referer “*viagra*” BadReferrer SetEnvIfNoCase Referer “*mortgage*” BadReferrer SetEnvIfNoCase Referer “*casino*” BadReferrer SetEnvIfNoCase Referer “*insurance*” BadReferrer SetEnvIfNoCase Referer “*loan*” BadReferrer SetEnvIfNoCase Referer “*buy*” BadReferrer SetEnvIfNoCase Referer “*xanax*” BadReferrer SetEnvIfNoCase Referer “*meridia*” BadReferrer SetEnvIfNoCase Referer “*incest*” BadReferrer SetEnvIfNoCase Referer “*lesbian*” BadReferrer SetEnvIfNoCase Referer “*viagra*” BadReferrer SetEnvIfNoCase Referer “*adult*” BadReferrer SetEnvIfNoCase Referer “*hentai*” BadReferrer SetEnvIfNoCase Referer “*tramadol*” BadReferrer SetEnvIfNoCase Referer “*phentermine*” BadReferrer SetEnvIfNoCase Referer “*gambling*” BadReferrer SetEnvIfNoCase Referer “*texas-*” BadReferrer SetEnvIfNoCase Referer “*holdem*” BadReferrer SetEnvIfNoCase Referer “*pharmacy*” BadReferrer SetEnvIfNoCase Referer “*ultram*” BadReferrer SetEnvIfNoCase Referer “*tramadol*” BadReferrer order deny,allow deny from env=BadReferrer |
If .htaccess finds one of the words in the referrer link, it will refuse to serve a page to the bot request, instead serving a 403 error page.
Something to consider. A legitimate user may get accidentally caught in your attempt to block the bots. What to do? Well, what I do is create a custom 403(forbidden page) with a link to the page they were trying to get to along with an apology. This allows them to still get to their desired page and will keep the bots from filling up your referrers quite so badly.
Another thing to note is that you can customize this list as your needs change. This list will get you started however, and will cause a noticeable drop in bogus hits.
There’s many more things you can do, but this will give you the most bang for the least effort. Google is your friend for finding other ways to reclaim your website.
[EDIT]New article discussing further measures to combat bots can be found here[/EDIT]
4 Comments for this entry
1 Trackback or Pingback for this entry
-
E-Dribble » Blog Archive » All your forms are belong to bots, part deux, the sequel. Again.
April 1st, 2009 on 9:31 pm[...] bots would be of interest in this day and age? After watching the search results pile up for a blog post that was intended to be a light touch upon a subject, I started feeling bad. Everyone kept showing up here looking for a magic bullet and I gave them [...]
May 29th, 2009 on 3:50 pm
Great post indeed!
what i was lookin for!
Thanks,
Weibel
May 30th, 2009 on 10:12 am
Awesome info! I was honestly just thinking about something similar to this the other day so, it was almost “weird” when I ran across this. You would be surprised how many people simply have no idea when it comes to this kind of stuff. Anyway, thanks for getting this info out there and I’m sure I’m not the only one who appreciates you taking the time to post this for the masses.
July 8th, 2010 on 10:03 pm
There is some irony in the fact that your number 2 comment (from ‘Mcclurkin’) is an automated spam post. Do a search on Google for a snippet of the comment text and you’ll see that this message is posted all over the place!
July 8th, 2010 on 10:24 pm
Actually not ironic at all. You’ll notice his links were stripped and I at one time had a smart assed comment in response to it, which I later removed since I realized that I’m the only one I amuse with it.