Guest EaglePilot Posted March 22, 2006 Share Posted March 22, 2006 Are there certain pages that I don't want a spider to crawl, such as Tell a Friend, Login, etc... If I don't want these certain links followed, what do I add to my spider.txt file to make it so. Thanks, Greg Quote Link to comment Share on other sites More sharing options...
markscarts Posted March 22, 2006 Share Posted March 22, 2006 I think the SEF 5.1 you are running already does that, doesn't it? Quote Link to comment Share on other sites More sharing options...
Guest walmarc Posted March 22, 2006 Share Posted March 22, 2006 What you need is a robots.txt file in your root directory - e.g. /public_html/robots.txt (see the example and search Google for robots.txt for the syntax). Regards robots.txt Quote Link to comment Share on other sites More sharing options...
markscarts Posted March 22, 2006 Share Posted March 22, 2006 :yes: My confusion is that he is running SEF 5.1 - and that mod is packaged with robots.txt - and it kills sessions, etc. out-of-the-box. ;) Quote Link to comment Share on other sites More sharing options...
Guest walmarc Posted March 22, 2006 Share Posted March 22, 2006 :yes: My confusion is that he is running SEF 5.1 - and that mod is packaged with robots.txt - and it kills sessions, etc. out-of-the-box. Oh I see - Sorry I misunderstood :) Quote Link to comment Share on other sites More sharing options...
Guest EaglePilot Posted March 23, 2006 Share Posted March 23, 2006 Thanks for your replies. I do have the SEF mod, but I never looked at the spider.txt file until recently, and I expected it to look differently. I didn't see anything in relation to session kill, or anything like that. This is the entire file, and sorry in advance for this making this page so long: abacho abcdatos abcsearch acoon adsarobot aesop ah-ha alkalinebot almaden altavista antibot anzwerscrawl aol search appie arachnoidea araneo architext ariadne arianna ask jeeves aspseek asterias astraspider atomz augurfind backrub baiduspider bannana_bot bbot bdcindexer blindekuh boitho boito borg-bot bsdseek christcrawler computer_and_automation_research_institute_crawler coolbot cosmos crawler crawler@fast crawlerboy cruiser cusco cyveillance deepindex denmex dittospyder docomo dogpile dtsearch elfinbot entire web esismartspider exalead excite ezresult fast fast-webcrawler fdse felix fido findwhat finnish firefly firstgov fluffy freecrawl frooglebot galaxy gaisbot geckobot gencrawler geobot gigabot girafa goclick goliat google googlebot griffon gromit grub-client gulliver gulper henrythemiragorobot hometown hotbot htdig hubater ia_archiver ibm_planetwide iitrovatore-setaccio incywincy incrawler indy infonavirobot infoseek ingrid inspectorwww intelliseek internetseer ip3000.com-crawler iron33 jcrawler jeeves jubii kanoodle kapito kit_fireball kit-fireball ko_yappo_robot kototoi lachesis larbin legs linkwalker lnspiderguy look.com lycos mantraagent markwatch maxbot mercator merzscope meshexplorer metacrawler mirago mnogosearch moget motor msn msnbot muscatferret nameprotect nationaldirectory naverrobot nazilla ncsa beta netnose netresearchserver ng/1.0 northerlights npbot nttdirectory_robot nutchorg nzexplorer odp openbot openfind osis-project overture perlcrawler phpdig pjspide polybot pompos poppi portalb psbot quepasacreep rabot raven rhcs robi robocrawl robozilla roverbot scooter scrubby search.ch search.com.ua searchfeed searchspider searchuk seventwentyfour sidewinder sightquestbot skymob sleek slider_search slurp solbot speedfind speedy spida spider_monkey spiderku stackrambler steeler suchbot suchknecht.at-robot suntek szukacz surferf3 surfnomore surveybot suzuran synobot tarantula teomaagent teradex t-h-u-n-d-e-r-s-t-o-n-e tigersuche topiclink toutatis tracerlock turnitinbot tutorgig uaportal uasearch.kiev.ua uksearcher ultraseek unitek vagabondo verygoodsearch vivisimo voilabot voyager vscooter w3index w3c_validator wapspider wdg_validator webcrawler webmasterresourcesdirectory webmoose websearchbench webspinne whatuseek whizbanglab winona wire wotbox wscbot www.webwombat.com.au xenu link sleuth xyro yahoobot yahoo! slurp yandex yellopet-spider zao/0 zealbot zippy zyborg Quote Link to comment Share on other sites More sharing options...
Guest EaglePilot Posted March 23, 2006 Share Posted March 23, 2006 I just looked in my robots.txt file, and this is what it says: User-agent: * Disallow: /index.php?act=login Disallow: /index.php?act=taf Disallow: /index.php?page=0 Disallow: /admin/ Disallow: /cgi-bin/ Disallow: /classes/ Disallow: /docs/ Disallow: /extra/ Disallow: /images/ Disallow: /includes/ Disallow: /install/ Disallow: /js/ Disallow: /language/ Disallow: /modules/ Disallow: /pear/ Disallow: /skins/ Disallow: /tellafriend/ Disallow: /shop.php/tellafriend/ Disallow: /shop/tellafriend/ Disallow: /cart.php Disallow: /confirmed.php Disallow: /download.php Disallow: /offLine.php Disallow: /spiders.txt Disallow: /switch.php Quote Link to comment Share on other sites More sharing options...
roban Posted March 23, 2006 Share Posted March 23, 2006 There is some controversy in SEO circles as to whether robots actually pay any attention to robots.txt. It does help however if you run a Google Site Mapper as your xml will exclude these. Quote Link to comment Share on other sites More sharing options...
Guest EaglePilot Posted March 23, 2006 Share Posted March 23, 2006 There is some controversy in SEO circles as to whether robots actually pay any attention to robots.txt. It does help however if you run a Google Site Mapper as your xml will exclude these. Thanks. I was just making sure I wasn't missing something. I appreciate everyone's help. Greg Quote Link to comment Share on other sites More sharing options...
Guest Posted April 13, 2006 Share Posted April 13, 2006 As long as we're comparing files...Is there supposed to be and .htaccess included with the SEO mod (or Cubecart)? In the install instructions it mentions deleting parts of it: Select 'Apache RewriteRule supported' then edit the ORIGINAL .htaccess file in the root directory OF YOUR SHOP and delete everything inbetween '# 1)' and '# end 1)'. Now try browsing your store. ORIGINAL .htaccess? I have an .htaccess file for my store, but it doesn't say anything about '# 1)' and '# end 1) or anything in between. Am I missing something? :) Quote Link to comment Share on other sites More sharing options...
markscarts Posted April 13, 2006 Share Posted April 13, 2006 Those statements are in the .htaccess file that comes with SEF mod by Rukiman Quote Link to comment Share on other sites More sharing options...
Guest JRz Posted December 14, 2006 Share Posted December 14, 2006 Sorry, what is a SEF mode. Please tell me where I can download the SEF mode. Is it free? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.