The entire code that makes up my ASP driven spider.
Code:
function GetPageCode(strURL,strUserAgent )
Response.Buffer = True
Dim objXMLHTTP, xml
Set xml = Server.CreateObject("Microsoft.XMLHTTP")
xml.Open "GET", strURL, False
xml.setRequestHeader "User-Agent", strUserAgent
xml.Send
if xml.status = 200 then
GetPageCode = xml.responseText
else
GetPageCode = cstr(xml.status) & " (" & xml.StatusText & ")"
end if
Set xml = Nothing
end function
Call the function with two parameters, the URL to be grabbed and the user agent to identify itself with in the remote servers logs and it comes back with either the response code or the source code from the URL.
It is that simple.
The robots.txt parser, database storage, URL parser etc are a bit more complicated but the crawler is very basic indeed.
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
|