Your Webmaster Resources Your Webmaster Resources

Yourwebmasterresources Webmaster Mobile Szh Tools Web Page Color Index Php Your Webmaster Resources A Close to perfect .htaccess ban list - Part 3 Apache Web Server forum at WebmasterWorld

Yourwebmasterresources Webmaster Mobile Szh Tools Web Page Color Index Php Your Webmaster Resources

Color l Szh ; Color searchrsearchfsearch&csearchlsearch2j Webmaster Mob Szh lsearch ectl02+javascriptWsearchb search P Mobile g Szh Tools c Page Ph Page Yo Yourwebmasterresources esearchrch Web s Php actl02+javascriptc Color e Tools m Color st Color h Php r Webmaster Mob Webmaster l Yourwebmasterresources Mobile ctl02+javascriptucessearchi Php ctl02+javascripta Webmaster e usearcht Color e Webmaster a Index p Tools e. Tools t Php l Webmaster qu Page tctl02+javascript ctl02+javascriptncsearchi Webmaster k Mobile & Webmaster usearchtrtctl02+javascriptr Mobile al Yourwebmasterresources eq Tools osearch;g Page ;ctl02+javascript& Yourwebmasterresources t Yourwebmasterresources /search&searchtctl02+javascript. The onclick false action is a safety net for visual readers so they don't accidently trigger the banning redirect. Because there is no text for the link it is invisible on the displayed web page.

The link is redirected by .htaccess to my banning script, which we will call ban.pl, for this example. The url cleansed code appears below:

RedirectMatch example\.html /cgi-bin/ban.pl

Now, whenever a scooper-bot, or html-only downloader visits and scrapes for links, they follow the link to example.html and get a 302 redirect, according to my web log, but they do not hit the Perl script! When I tested this in Wannabrowser I was sent to the Perl script and banned, as designed. Here is my latest log of this mis-event:

"GET /example.html HTTP/1.0" 302 219 "-" "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)"

The ban script is definitely larger than 219 bytes! This leach also took many more html only pages before leaving my server. Thus, he was not self banned, and never triggered the script to which he was supposed to be redirected.

I'd appreciate any help in getting this right.
TIA, Wiz

jdMorgan




msg:1506472
 10:02 pm on Dec 27, 2003 (gmt 0)
Wiz,

The problem is that bad-bots don't follow 302 external redirects! 301 and 302 redirects require the browser (or user-agent) to reissue the request using the URL supplied in the 30x server response. Thus, the user-agent has to actively cooperate in order to fetch the destination file specified by the 30x response.

Whatchawanna do is to force-feed it a completely-server-internal file substitution, not a redirect:

RewriteRule ^example\.html$ /cgi-bin/ban.pl [L]

This instructs the server to immediately substitute the ban.pl file whenever example.html is requested.

If you don't have mod_rewrite capability, about the best you can do is to set up a unix symlink called example.html and point it to ban.pl.

Jim
Wizcrafts




msg:1506473
 10:09 pm on Dec 27, 2003 (gmt 0)
Thanks for the explanation Jim, and Happy Holidays. I will do the Rewrite Rule.

Wiz
decdim




msg:1506474
 8:29 pm on Jan 15, 2004 (gmt 0)
Some new "visitors" from my "Last 300 Visitors Page"

Some of these may be repeats...please excuse...

>>Triple edit<<
---

### What's up with the referer?! :)
Referer: file:///C:/leads17.html
Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT; SEARCHALOT.COM IE5)

Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent)

### MSIE 7.01?
Agent: Mozilla/4.0 (compatible; MSIE 7.01; Windows 98)

### To look like "apple"?!
Agent: appie 1.1 (www.walhello.com)

Agent: Microsoft URL Control - 6.00.8862

Agent: Program Shareware 1.0.3

Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
BohrMe




msg:1506475
 8:05 pm on Jan 20, 2004 (gmt 0)
Along these same lines I have copied the trap.cgi bad bot trap found on this board to formmail.pl/cgi in an effort to auto-ban those trying to exploit formmail, which I do not use.

The problem with this is that every person who has tried to access formmail directly has their HTTP_USER_AGENT set to "-" BUT I have the following in my .htaccess:

RewriteCond %{} ^-?$
RewriteRule ^.*$ noID.php [L]

This causes noID.php to be executed instead of my trap.

Does anyone know of a condtional statement that I can use in my .htaccess to test the URL first and if it matches a case insensitive formmail, do not test for a "-" HTTP_USER_AGENT?

Would it be as simple as placing another RewiteCond condition before this one to check the URL?
jdMorgan




msg:1506476
 9:51 pm on Jan 20, 2004 (gmt 0)
BohrMe,

> Would it be as simple as placing another RewriteCond condition before this one to check the URL?

Yes.

RewriteCond %{} !form.?mail [NC]

inserted ahead of your existing RewriteCond would stop the usual formmail requests from being redirected by that Rule.

Jim
BohrMe




msg:1506477
 10:05 pm on Jan 20, 2004 (gmt 0)
That did the trick! Thank you much!

johnlim




msg:1506478
 7:32 am on Feb 6, 2004 (gmt 0)
THis is a wonderful thread for perfect .htacess, could somebody make a conclusion,

1) What is the best .htaccess, what should put inside?

2) What part should be put into httpd.conf; what else should remain at .htaccess?

Thanks.
jdMorgan




msg:1506479
  pYourwebmasterresources Webmaster Mobile Szh Tools Web Page Color Index Php Your Webmaster Resources A Close to perfect .htaccess ban list - Part 3 Apache Web Server forum at WebmasterWorld t h Your Webmaster Resources Your Webmaster Resources n Your Webmaster Resources Your Webmaster Resources eYourwebmasterresources Webmaster Mobile Szh Tools Web Page Color Index Php Your Webmaster Resources A Close to perfect .htaccess ban list - Part 3 Apache Web Server forum at WebmasterWorld b p Webmaster Your Webmaster Resources w Your Webmaster Resources Server