Jump to content

google bot .httaccess


chiggs2

Recommended Posts

Our website has had a discruption of service attack which is still ongoing from a spoof google bot. Managed to stop the attack bringing the server down by blocking it but now google won't spider the website and we've been dumped from the search results. What I am looking for is a way to stop the spoof google bot but still allow the real google bot access to the site to spider it. The fake bot is only attacking the homepage. I'm guessing this can be done in the .httaccess file but as yet our hosting company haven't been able to come up with the solution. Hoping someone has some info as this is obviously disasterous for our business, especially over the christmas period as we rely on google for our traffic.

Link to comment
Share on other sites

http://httpd.apache.org/docs/2.2/howto/access.html

 

Entering:

Deny from 123.456.789.ABC

into the ,htaccess file should tell the web server to issue a 403 Access Denied.

 

"I'm guessing this can be done in the .httaccess file but as yet our hosting company haven't been able to come up with the solution."

 

If this isn't working, then maybe your web server is not Apache, or that the Apache mod_authz_host module is not available. Most assuredly, however, is that the bottom of the barrel tech support is on duty at these hours.

 

But, the best approach is to have the firewall, not your web server, deal with this. In your hosting control panel, do you have a firewall utility?

Link to comment
Share on other sites

Hi

How have you identified this as a spoof Google bot and what specific action did you take to prevent it ?

Is this your own dedicated or VPS server and what support contract (managed or something else ?) do you have from your hosting company ?

What protection do you have on the server - software or hardware based firewall and do you have a Web Application Firewall of any sort.

Not enough information to offer much in the way of possible solutions but if you have managed support then your hosting company should be able to stop this

Thanks

Ian

Link to comment
Share on other sites

Thanks for the replies.

 

The fake google bots were identified by the following

 

123.185.5.39 - - [06/Dec/2013:12:15:02 +0000] "GET / HTTP/1.0" 500 - "http://www.ourwebsite.com" "Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)"
115.58.71.181 - - [06/Dec/2013:12:15:02 +0000] "GET / HTTP/1.1" 500 - "http://www.ourwebsite.com" "Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)"
42.184.5.150 - - [06/Dec/2013:12:15:02 +0000] "GET / HTTP/1.1" 500 - "http://www.ourwebsite.com" "Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)"
222.142.191.1 - - [06/Dec/2013:12:15:02 +0000] "GET / HTTP/1.1" 500 - "http://www.ourwebsite.com" "Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)"
221.130.29.184 - - [06/Dec/2013:12:15:02 +0000] "GET / HTTP/1.0" 500 - "http://www.ourwebsite.com" "Mozilla/5.0 (compatible; Googlebot/2.0; +http://www.google.com/bot.html)"

 

 

Our current .httpaccess file is as follows

 

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^ourwebsite.com$
RewriteRule ^(.*)$ http://www.ourwebsite.com/$1 [R=301,L]
#RewriteCond %{HTTP_HOST} ^ourwebsite.com$
#RewriteRule ^/?$ "http://www.ourwebsite.com/" [R=301,L]

#SEO Rules
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*).html?$ index.php?seo_path=$1 [L,QSA]

#Fake Googlebot
RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteCond %{REMOTE_ADDR} !^66.249. [OR]
RewriteCond %{REMOTE_ADDR} !^216.229.
RewriteRule .* - [F]

<Files 403.shtml>
order allow,deny
allow from all
</Files>

 

This is blocking the spoof bots but also appears to be blocking the real google bot. We have a vps server and the support we are getting is pretty quick but the problem of our site not being spidered persists. The hosting company have also identified this problem

 

[sun Dec 08 07:48:45.292558 2013] [core:error] [pid 22888] [client 121.231.234.151:55566] AH00124: Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace., referer: www.ourwebsite.com

 

 

and suggest that it may not be helping the situation, I'm out of my depth with this, have no idea what that can be.

 

The fake bots are blocked just need to get google spidering the site again.

Link to comment
Share on other sites

UPDATE

 

Modified google bot code in .httaccess file to following without any seo rewrite code in.

 

#Fake Googlebot
RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteCond %{REMOTE_ADDR} !^64.233. [OR]
RewriteCond %{REMOTE_ADDR} !^66.102. [OR]
RewriteCond %{REMOTE_ADDR} !^66.249. [OR]
RewriteCond %{REMOTE_ADDR} !^72.14. [OR]
RewriteCond %{REMOTE_ADDR} !^74.125. [OR]
RewriteCond %{REMOTE_ADDR} !^209.85. [OR]
RewriteCond %{REMOTE_ADDR} !^216.239.
RewriteRule .* - [F]

 

Tested this on a new install and google spiders the site OK, but it doesn't spider the site with the original problem. Host think there is a problem in cubecart code

 

"If this is working on another site it would suggest something is not right with the script on account as this is the only other thing that could be causing a problem. The code is not blocking genuine google requests which means the backend script with the redirect issue is causing the fault."

 

When I test the sitemap in webmaster tools I get

 

General HTTP error: HTTP 403 error (Forbidden)
HTTP Error: 403
Link to comment
Share on other sites

When I added this to MY .htaccess code, my sitemap also has the same error message in Webmaster Tools. I tested it on a 5.2.5 modified kurouto site. 

 

BUT I tested it again when I removed your code, and I STILL get the waring message. Will investigate further.

 

Well, scratch that - must have tried again too soon, as I tried again and mine worked just fine without your extra anti spoof bot code.

Link to comment
Share on other sites

We can stop the attack no problem but by blocking the spoof google bot we are blocking the real one so the website is now suffering a ranking drop, infact it's more of an exclusion at the moment. If anyone knows how to block the spoof google bots and allow the real one through to spider the site again please let me know.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...