Jump to content

Spider.txt?


Guest EaglePilot

Recommended Posts

Guest EaglePilot

Are there certain pages that I don't want a spider to crawl, such as Tell a Friend, Login, etc... If I don't want these certain links followed, what do I add to my spider.txt file to make it so.

Thanks,

Greg

Link to comment
Share on other sites

Guest walmarc

:yes: */*

My confusion is that he is running SEF 5.1 - and that mod is packaged with robots.txt - and it kills sessions, etc. out-of-the-box.

;)

Oh I see - Sorry I misunderstood :)

Link to comment
Share on other sites

Guest EaglePilot

Thanks for your replies. I do have the SEF mod, but I never looked at the spider.txt file until recently, and I expected it to look differently. I didn't see anything in relation to session kill, or anything like that. This is the entire file, and sorry in advance for this making this page so long:

abacho

abcdatos

abcsearch

acoon

adsarobot

aesop

ah-ha

alkalinebot

almaden

altavista

antibot

anzwerscrawl

aol search

appie

arachnoidea

araneo

architext

ariadne

arianna

ask jeeves

aspseek

asterias

astraspider

atomz

augurfind

backrub

baiduspider

bannana_bot

bbot

bdcindexer

blindekuh

boitho

boito

borg-bot

bsdseek

christcrawler

computer_and_automation_research_institute_crawler

coolbot

cosmos

crawler

crawler@fast

crawlerboy

cruiser

cusco

cyveillance

deepindex

denmex

dittospyder

docomo

dogpile

dtsearch

elfinbot

entire web

esismartspider

exalead

excite

ezresult

fast

fast-webcrawler

fdse

felix

fido

findwhat

finnish

firefly

firstgov

fluffy

freecrawl

frooglebot

galaxy

gaisbot

geckobot

gencrawler

geobot

gigabot

girafa

goclick

goliat

google

googlebot

griffon

gromit

grub-client

gulliver

gulper

henrythemiragorobot

hometown

hotbot

htdig

hubater

ia_archiver

ibm_planetwide

iitrovatore-setaccio

incywincy

incrawler

indy

infonavirobot

infoseek

ingrid

inspectorwww

intelliseek

internetseer

ip3000.com-crawler

iron33

jcrawler

jeeves

jubii

kanoodle

kapito

kit_fireball

kit-fireball

ko_yappo_robot

kototoi

lachesis

larbin

legs

linkwalker

lnspiderguy

look.com

lycos

mantraagent

markwatch

maxbot

mercator

merzscope

meshexplorer

metacrawler

mirago

mnogosearch

moget

motor

msn

msnbot

muscatferret

nameprotect

nationaldirectory

naverrobot

nazilla

ncsa beta

netnose

netresearchserver

ng/1.0

northerlights

npbot

nttdirectory_robot

nutchorg

nzexplorer

odp

openbot

openfind

osis-project

overture

perlcrawler

phpdig

pjspide

polybot

pompos

poppi

portalb

psbot

quepasacreep

rabot

raven

rhcs

robi

robocrawl

robozilla

roverbot

scooter

scrubby

search.ch

search.com.ua

searchfeed

searchspider

searchuk

seventwentyfour

sidewinder

sightquestbot

skymob

sleek

slider_search

slurp

solbot

speedfind

speedy

spida

spider_monkey

spiderku

stackrambler

steeler

suchbot

suchknecht.at-robot

suntek

szukacz

surferf3

surfnomore

surveybot

suzuran

synobot

tarantula

teomaagent

teradex

t-h-u-n-d-e-r-s-t-o-n-e

tigersuche

topiclink

toutatis

tracerlock

turnitinbot

tutorgig

uaportal

uasearch.kiev.ua

uksearcher

ultraseek

unitek

vagabondo

verygoodsearch

vivisimo

voilabot

voyager

vscooter

w3index

w3c_validator

wapspider

wdg_validator

webcrawler

webmasterresourcesdirectory

webmoose

websearchbench

webspinne

whatuseek

whizbanglab

winona

wire

wotbox

wscbot

www.webwombat.com.au

xenu link sleuth

xyro

yahoobot

yahoo! slurp

yandex

yellopet-spider

zao/0

zealbot

zippy

zyborg

Link to comment
Share on other sites

Guest EaglePilot

I just looked in my robots.txt file, and this is what it says:

User-agent: *

Disallow: /index.php?act=login

Disallow: /index.php?act=taf

Disallow: /index.php?page=0

Disallow: /admin/

Disallow: /cgi-bin/

Disallow: /classes/

Disallow: /docs/

Disallow: /extra/

Disallow: /images/

Disallow: /includes/

Disallow: /install/

Disallow: /js/

Disallow: /language/

Disallow: /modules/

Disallow: /pear/

Disallow: /skins/

Disallow: /tellafriend/

Disallow: /shop.php/tellafriend/

Disallow: /shop/tellafriend/

Disallow: /cart.php

Disallow: /confirmed.php

Disallow: /download.php

Disallow: /offLine.php

Disallow: /spiders.txt

Disallow: /switch.php

Link to comment
Share on other sites

Guest EaglePilot

There is some controversy in SEO circles as to whether robots actually pay any attention to robots.txt. It does help however if you run a Google Site Mapper as your xml will exclude these.

Thanks. I was just making sure I wasn't missing something. I appreciate everyone's help.

Greg

Link to comment
Share on other sites

  • 3 weeks later...

As long as we're comparing files...Is there supposed to be and .htaccess included with the SEO mod (or Cubecart)? In the install instructions it mentions deleting parts of it:

Select 'Apache RewriteRule supported' then edit the ORIGINAL .htaccess file in the root directory OF YOUR SHOP and delete everything inbetween '# 1)' and '# end 1)'. Now try browsing your store.

ORIGINAL .htaccess? I have an .htaccess file for my store, but it doesn't say anything about '# 1)' and '# end 1) or anything in between. Am I missing something? :)

Link to comment
Share on other sites

  • 8 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...