Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

SEO Tycoon


You are currently viewing our SEO Tycoon as a guest. Please register to participate.
Login



Freelance Jobs

Reply
Old 09-05-2006, 05:37 AM Robot TXT files
earnsomecoin's Avatar
$1,000 - $4,999 Monthly

Posts: 118
Trades: 0
Are robot.txt files still necessary for websites?

If so, why would one use them?

How do you know what to include in the file?

Is there someone out there that can write these files spacifically for my websites?

Thanks!
__________________

Please login or register to view this content. Registration is FREE

Please login or register to view this content. Registration is FREE
earnsomecoin is offline
Reply With Quote
View Public Profile Visit earnsomecoin's homepage!
 
 
Register now for full access!
Old 09-05-2006, 05:52 AM
imported_Koz's Avatar
Junior Talker

Posts: 118
Trades: 0
This article might help you: Googlebot Behaviour

Robot.txt files would be used to block search engines (you can specify which ones) from crawling a number of pages on your site. It depends on what kind of site you have whether it is neccessary for you or not.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
imported_Koz is offline
Reply With Quote
View Public Profile Visit imported_Koz's homepage!
 
Old 09-05-2006, 06:12 AM
Junior Talker

Posts: 109
Trades: 0
Using robots.txt file to block some stuff can be useful, such as:
- PDF folders
- CSS folder
- scripts folder

Basically, anything that you dont want crawled, you need to block in robots.txt. It will save you traffic and won't put it online (important for client documents which are sometimes online).
__________________

Please login or register to view this content. Registration is FREE
A.N.Onym is offline
Reply With Quote
View Public Profile Visit A.N.Onym's homepage!
 
Old 09-05-2006, 11:20 AM
Junior Talker

Posts: 126
Trades: 1
Remember not to list / block very secret folders in your robots.txt, since this is a good place to find stuff that people don't want to share. E.g. don't block your administration folder etc. Instead use the robots meta tag for those individual pages.
stefanjuhl is offline
Reply With Quote
View Public Profile Visit stefanjuhl's homepage!
 
Old 09-09-2006, 11:39 PM
Bookworm-SEO's Avatar
SEO Champ

Posts: 440
Trades: 1
You don't need it.
If you process credit cards or run anything requiring confidentiality (and which could get you sued if you lose the info to others), use robots.txt. Another way I just read about is blocking everything but googlebot (and their other crawlers, such as the adsbot, if you use adwords), slurp (yahoo) and the msn spider (I don't think it has a name). Avoids allowing malicious crawlers to steal/scrape your content.
__________________

Please login or register to view this content. Registration is FREE

Please login or register to view this content. Registration is FREE

Please login or register to view this content. Registration is FREE

Please login or register to view this content. Registration is FREE
Bookworm-SEO is offline
Reply With Quote
View Public Profile Visit Bookworm-SEO's homepage!
 
Old 09-10-2006, 05:22 AM
T-L
$1,000 - $4,999 Monthly

Posts: 53
Trades: 0
I use one for all of my sites. Here's a few things to note:

On forums, many different links lead to the same content, which could be harmful with the search engines due to duplicate content.

Another thing - If you have a nice template, keep the bots out of your image folders. You really don't want your template images showing up on Google Images because some people seem to think that G-Images is their design resource playground.
T-L is offline
Reply With Quote
View Public Profile
 
Old 09-10-2006, 07:53 PM
andrewbowe's Avatar
Skilled Talker

Posts: 55
Name: Andrew Bowe
Location: United Kingdom
Trades: 0
just use it to keep the crawlers out of your images folder or anything elese you store on your webspace because it will look unproffesional if they start showing up in the search engines
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
andrewbowe is offline
Reply With Quote
View Public Profile Visit andrewbowe's homepage!
 
Old 09-11-2006, 03:02 AM
Junior Talker

Posts: 126
Trades: 1
Quote:
Originally Posted by Bookworm-SEO View Post
You don't need it.
If you process credit cards or run anything requiring confidentiality (and which could get you sued if you lose the info to others), use robots.txt.
That's NOT the right way to do it. Anything containing user data or whatever confidential should be blocked so it is in no way accessible. Robots.txt won't do that for you. It will do the opposite because as soon as the competitor hires a skilled SEO consultant they'll check robots.txt for competing websites to see what kind of stuff they don't want us to know about. So you're actually more opening for exploits than blocking anything!

Quote:
Originally Posted by Bookworm-SEO View Post
Another way I just read about is blocking everything but googlebot (and their other crawlers, such as the adsbot, if you use adwords), slurp (yahoo) and the msn spider (I don't think it has a name). Avoids allowing malicious crawlers to steal/scrape your content.
Malicious crawlers that scrape content/e-mail addresses etc. don't obey the robots.txt. (Even the search engines doesn't obey the robots.txt at all times...) They need to be banned server-side based on ip's, behavior etc. The common way to do this is with spider-traps.
stefanjuhl is offline
Reply With Quote
View Public Profile Visit stefanjuhl's homepage!
 
Old 09-11-2006, 03:04 AM
Junior Talker

Posts: 126
Trades: 1
Quote:
Originally Posted by safron View Post
just use it to keep the crawlers out of your images folder or anything elese you store on your webspace because it will look unproffesional if they start showing up in the search engines
So you don't want the traffic that Google image seach and others can provide..?

You should only block image folders if the crawlers use too much bandwidth, or if you're really sure that you don't want/need traffic from image search.
stefanjuhl is offline
Reply With Quote
View Public Profile Visit stefanjuhl's homepage!
 
Old 09-11-2006, 03:06 AM
Junior Talker

Posts: 126
Trades: 1
Quote:
Originally Posted by TriteLife View Post
On forums, many different links lead to the same content, which could be harmful with the search engines due to duplicate content.
That's one of the few good reasons to block crawlers with robots.txt

Though you'd be better of if you used cloaking for the crawlers to redirect the duplicate URL's to the primary URL's.
stefanjuhl is offline
Reply With Quote
View Public Profile Visit stefanjuhl's homepage!
 
Old 09-14-2006, 04:46 PM
imported_WoodiE55's Avatar
bust'n rocks chain gang

Posts: 112
Trades: 0
If I'm not mistaken Lee wrote either here or on ForumTrends.com that he actually blocks all SE's via his robot.txt file from his forum and points them only to the forum archive so the risk of dup content isn't there.

Anyone else do this?
imported_WoodiE55 is offline
Reply With Quote
View Public Profile
 
Reply     « Reply to Robot TXT files
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.32642 seconds with 12 queries