There are several methods of blocking bad agents and unwanted or misbehaving spiders in Apache. Unfortunently, most of them use mod_rewrite, and mod_rewrite rules are not inherited by default into other vhosts in your Apache config.
Using this method, you can block spiders by blocking their identifying user_agent. This method, when applied to your global Apache config will apply to all of your vhosts as well.
Insert these lines into your global Apache config (usually httpd.conf). I placed them right above the “
<Directory />” statement.
SetEnvIfNoCase User-Agent "^Baidu" bad_bot
SetEnvIfNoCase User-Agent "^Baiduspider" bad_bot
SetEnvIfNoCase User-Agent "^twiceler" bad_bot
Allow from all
Deny from env=bad_bot
You can substitute or add whatever user_agent strings you want by adding lines to the “
SetEnvIfNoCase” portion of the lines above.
As always with my advice, your actual results may vary.