Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

Coding Forum


You are currently viewing our Coding Forum as a guest. Please register to participate.
Login



Reply
Old 11-27-2007, 08:47 PM Search Engine Bot
Nathand's Avatar
Extreme Talker

Posts: 233
Location: USA
Trades: 0
How do search engine bots work? Like the Google bot. How would you even go about programming something like that? How would you program something to click on links and grab text off pages to be stored in databases?

Just curious,
Nathan
Nathand is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 11-28-2007, 12:24 AM Re: Search Engine Bot
ForrestCroce's Avatar
Half Man, Half Amazing

Posts: 3,023
Name: Forrest Croce
Location: Seattle, WA
Trades: 0
Pretty much the same way a browser or rss reader works; by using http. Here's a code sample.
__________________

Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
ForrestCroce is offline
Reply With Quote
View Public Profile Visit ForrestCroce's homepage!
 
Old 11-28-2007, 06:27 AM Re: Search Engine Bot
chrishirst's Avatar
Missing! presumed drunk.

Posts: 41,517
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
You wouldn't program a search bot to "click" links. Even the "Link Walker" type bots don't actually "click" things.

The spider is simply a retrieval agent. It goes out, reads the HTML source code from the URL and stores it.
Other software agents then read the stored version and break it down. Any URIs found in the code are then added to a crawler schedule database to be crawled at a later stage.
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
chrishirst is offline
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Old 11-28-2007, 12:26 PM Re: Search Engine Bot
Nathand's Avatar
Extreme Talker

Posts: 233
Location: USA
Trades: 0
Hmm.. So essentially if I learn how web browsers work (getting the html file etc.) I'll have some understanding of how a spider would be programmed.

Thanks for the input,
Nathan
Nathand is offline
Reply With Quote
View Public Profile
 
Old 11-28-2007, 03:46 PM Re: Search Engine Bot
chrishirst's Avatar
Missing! presumed drunk.

Posts: 41,517
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
The entire code that makes up my ASP driven spider.
Code:
function GetPageCode(strURL,strUserAgent )
 Response.Buffer = True
  Dim objXMLHTTP, xml
  Set xml = Server.CreateObject("Microsoft.XMLHTTP")
  
 	xml.Open "GET", strURL, False
	xml.setRequestHeader "User-Agent", strUserAgent 
	xml.Send
	if xml.status = 200 then
	GetPageCode = xml.responseText
	else
	GetPageCode = cstr(xml.status) & " (" & xml.StatusText & ")"
	end if
  Set xml = Nothing
end function
Call the function with two parameters, the URL to be grabbed and the user agent to identify itself with in the remote servers logs and it comes back with either the response code or the source code from the URL.

It is that simple.

The robots.txt parser, database storage, URL parser etc are a bit more complicated but the crawler is very basic indeed.
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
chrishirst is offline
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Old 11-28-2007, 08:24 PM Re: Search Engine Bot
Nathand's Avatar
Extreme Talker

Posts: 233
Location: USA
Trades: 0
Thanks! I appreciate you sharing that with me.

-Nathan
Nathand is offline
Reply With Quote
View Public Profile
 
Old 11-29-2007, 03:00 AM Re: Search Engine Bot
Truly's Avatar
Ultra Talker

Posts: 322
Trades: 0
Could it be as simple as using php's fopen function and then just analyzing the data that comes out of that, although HTTP commands are probably better.
Truly is online now
Reply With Quote
View Public Profile
 
Reply     « Reply to Search Engine Bot
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.24570 seconds with 12 queries