Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

The Other Search Engines


You are currently viewing our The Other Search Engines as a guest. Please register to participate.
Login



Closed Thread
robot.txt, just cause i didnt know :)
Old 01-21-2005, 11:09 PM robot.txt, just cause i didnt know :)
hackySack's Avatar
Extreme Talker

Posts: 150
Location: UK, Derbyshire
Trades: 0
Hey i decided to post this just cause i had no idea how to write a robot.txt file, hope some one finds this useful

ps..... Extract from http://www.robotstxt.org/wc/faq.html#robotstxt

Good luck and enjoy


The format and semantics of the "/robots.txt" file are as follows:
The file consists of one or more records separated by one or more blank lines (terminated by CR,CR/NL, or NL). Each record contains lines of the form "<field>:<optionalspace><value><optionalspace> ". The field name is case insensitive.

Comments can be included in file using UNIX bourne shell conventions: the '#' character is used to indicate that preceding space (if any) and the remainder of the line up to the line termination is discarded. Lines containing only a comment are discarded completely, and therefore do not indicate a record boundary.

The record starts with one or more User-agent lines, followed by one or more Disallow lines, as detailed below. Unrecognised headers are ignored.

User-agent
The value of this field is the name of the robot the record is describing access policy for.
If more than one User-agent field is present the record describes an identical access policy for more than one robot. At least one field needs to be present per record.

The robot should be liberal in interpreting this field. A case insensitive substring match of the name without version information is recommended.

If the value is '*', the record describes the default access policy for any robot that has not matched any of the other records. It is not allowed to have multiple such records in the "/robots.txt" file.

Disallow
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html.
Any empty value, indicates that all URLs can be retrieved. At least one Disallow field needs to be present in a record.

The presence of an empty "/robots.txt" file has no explicit associated semantics, it will be treated as if it was not present, i.e. all robots will consider themselves welcome.

The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/", or /foo.html:
--------------------------------------------------------------------------------

# robots.txt for http://www.example.com/

User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
Disallow: /foo.html
__________________

Please login or register to view this content. Registration is FREE



Please login or register to view this content. Registration is FREE
hackySack is offline
View Public Profile
 
 
Register now for full access!
Old 01-22-2005, 04:21 PM
kline11's Avatar
SearchBliss Web Tools

Posts: 1,728
Name: John
Location: USA
Trades: 0
You can build one for free here: http://www.searchbliss.com/webmaster...-generator.htm
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
kline11 is offline
View Public Profile
 
Closed Thread     « Reply to robot.txt, just cause i didnt know :)
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.32667 seconds with 12 queries