Dirty Butter Posted June 1, 2015 Share Posted June 1, 2015 Baiduspider is in the array of bots to ignore when Customers online is set to customers only. But we still get a lot of 0.02 (seconds?) hits from this spider. Any suggestions for eliminating this bot? Quote Link to comment Share on other sites More sharing options...
bsmither Posted June 1, 2015 Share Posted June 1, 2015 It is 0.02 minutes. We can tweak the query: statistics.index.inc.php, near line 310From: $filter = '(S.session_last > S.session_start) AND '; To: $filter = '(S.session_last - S.session_start > 3) AND ';Only someone who has been online for more than 3 seconds (0.05 minutes).Or, as was probably mentioned earlier, make the array of bot signatures 'public' and see if the signature is in the session string. In User class, line 34From: protected $_bot_sigs = array( To: public $_bot_sigs = array(Then, in statistics.index.inc.php, line 321From: foreach ($results as $user) { To: $user_bot_sigs = User->getInstance()->_bot_sigs; foreach ($results as $user) { if (!empty($filter)) { // Scanning for bots foreach ($user_bot_sigs as $signature) { if (stripos($user['useragent'], $signature) !== false) { continue 2; } } } Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted June 5, 2015 Author Share Posted June 5, 2015 Thank you.The method shown here uses some code I had commented out from the previous help you had given about bots on another post. After doing a little research, I find there are a HUGE number of variations of the word baidu for different Chinese bots. It looks like it would be simpler in the long run to take the previous fix out and use just the time limit method explained here. I'll be back. Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted June 5, 2015 Author Share Posted June 5, 2015 I took it back to stock statistics.index.inc.php and no results at all show with your suggested change:$filter = '(S.session_last - S.session_start > 3) AND ';I experimented with a few variations. Using 0.05 and 0.9 shows the bot entries, and using 1 again shows no results at all. Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted June 10, 2015 Author Share Posted June 10, 2015 Any other ideas on this? I'm still getting huge numbers of 0.02 and 0.03 visits I'd like to filter out. Quote Link to comment Share on other sites More sharing options...
bsmither Posted June 10, 2015 Share Posted June 10, 2015 "I find there are a HUGE number of variations of the word baidu for different Chinese bots."Please give us a dozen examples.This test:if (stripos($user['useragent'], $signature) !== false)is looking for the sequence of characters 'baidu' anywhere in the useragent string. So, regardless of the variations, if 'baidu' is in the string, that session record will be knocked out. Quote Link to comment Share on other sites More sharing options...
bsmither Posted June 10, 2015 Share Posted June 10, 2015 It seems there is a Chrome replacement called "Baidu Browser". It used to be called "Spark" and does not contain the character sequence 'baidu' in the user-agent string. Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted June 10, 2015 Author Share Posted June 10, 2015 I thought I had bookmarked the site I found that gave lists of all spiders. Still looking for it. Will be back. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.