Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

PHP Forum


You are currently viewing our PHP Forum as a guest. Please register to participate.
Login



Freelance Jobs

Reply
Old 04-01-2008, 05:16 AM Spiders / Crawlers
Gilligan's Avatar
Website Designer

Posts: 1,670
Name: Stefan
Location: London, UK
Trades: 0
My script adds information to a database everytime the page is loaded, IP, time, date, referer etc. Kind of like a counter.

I keep getting IP addresses that are search engine spiders, I currently manually add their IP to a line which then doesn't add that IP's info to the database. Although the amount of spiders are increasing and manually is taking to long, is there a way to stop ALL spider's IPs from adding to the database automatically?
__________________

Please login or register to view this content. Registration is FREE
Gilligan is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 04-01-2008, 05:47 AM Re: Spiders / Crawlers
vectorialpx's Avatar
Extreme Talker

Posts: 249
Name: octavian
Location: Bucharest
Trades: 0
I understand that if a spider runs your script you get an IP... so, check if there is a human
__________________
you can
Please login or register to view this content. Registration is FREE
vectorialpx is offline
Reply With Quote
View Public Profile Visit vectorialpx's homepage!
 
Old 04-01-2008, 01:48 PM Re: Spiders / Crawlers
Gilligan's Avatar
Website Designer

Posts: 1,670
Name: Stefan
Location: London, UK
Trades: 0
what do u mean, both humans and robots adds to the database
__________________

Please login or register to view this content. Registration is FREE
Gilligan is offline
Reply With Quote
View Public Profile
 
Old 04-01-2008, 02:37 PM Re: Spiders / Crawlers
VirtuosiMedia's Avatar
Web Design Made Simple

Posts: 1,228
Trades: 0
I'm pretty sure there are bot lists floating around.
VirtuosiMedia is offline
Reply With Quote
View Public Profile Visit VirtuosiMedia's homepage!
 
Old 04-01-2008, 05:14 PM Re: Spiders / Crawlers
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
Quote:
Originally Posted by Gilligan View Post
I keep getting IP addresses that are search engine spiders, I currently manually add their IP to a line which then doesn't add that IP's info to the database. Although the amount of spiders are increasing and manually is taking to long, is there a way to stop ALL spider's IPs from adding to the database automatically?
User agent string?
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 04-01-2008, 05:52 PM Re: Spiders / Crawlers
Gilligan's Avatar
Website Designer

Posts: 1,670
Name: Stefan
Location: London, UK
Trades: 0
unsure of what that is..care to elaborate?
__________________

Please login or register to view this content. Registration is FREE
Gilligan is offline
Reply With Quote
View Public Profile
 
Old 04-01-2008, 06:11 PM Re: Spiders / Crawlers
VirtuosiMedia's Avatar
Web Design Made Simple

Posts: 1,228
Trades: 0
Try taking a look at these:

http://www.useragentstring.com/pages...gentstring.php
http://www.texsoft.it/index.php?c=so...useragent&l=it
VirtuosiMedia is offline
Reply With Quote
View Public Profile Visit VirtuosiMedia's homepage!
 
Old 04-01-2008, 08:44 PM Re: Spiders / Crawlers
mgraphic's Avatar
Truth Seeker

Latest Blog Post:
JAMISONTUNES
Posts: 2,918
Name: Keith Marshall
Location: Connecticut
Trades: 0
This might help:
http://www.webmaster-talk.com/php-fo...tml#post525177
__________________

<mgraphic /> - I don't have a solution but I admire the problem.
mgraphic is offline
Reply With Quote
View Public Profile
 
Old 04-02-2008, 03:22 AM Re: Spiders / Crawlers
Gilligan's Avatar
Website Designer

Posts: 1,670
Name: Stefan
Location: London, UK
Trades: 0
so if i just do

PHP Code:
 <?php
  
  $user_agent 
strtolower($_SERVER['HTTP_USER_AGENT']);
    
   if (!empty(
$user_agent))
   {
     
// stuff for spider here ??
   
}
   
   else 
   {
     
// stuff for human here ??
   
}
  
?>
__________________

Please login or register to view this content. Registration is FREE
Gilligan is offline
Reply With Quote
View Public Profile
 
Old 04-02-2008, 11:21 AM Re: Spiders / Crawlers
VirtuosiMedia's Avatar
Web Design Made Simple

Posts: 1,228
Trades: 0
Quote:
Originally Posted by Gilligan View Post
so if i just do

PHP Code:
 <?php
  
  $user_agent 
strtolower($_SERVER['HTTP_USER_AGENT']);
    
   if (!empty(
$user_agent))
   {
     
// stuff for spider here ??
   
}
   
   else 
   {
     
// stuff for human here ??
   
}
  
?>
Not quite. Both bots and humans are going to have user agents. What you'll have to do instead is check the user agent against a list of bots. If it matches, don't add it to the database; if it doesn't match, add it to the database. One way, and I'm not sure if it's the most efficient way, to check for this is by using regular expressions to compare the bot name against the user agent. As an example:

PHP Code:
if ((eregi("google"$user_agent)) {
     
//A bot, don't add
} else {
     
//A human, add

VirtuosiMedia is offline
Reply With Quote
View Public Profile Visit VirtuosiMedia's homepage!
 
Old 04-02-2008, 01:32 PM Re: Spiders / Crawlers
Gilligan's Avatar
Website Designer

Posts: 1,670
Name: Stefan
Location: London, UK
Trades: 0
so there's no general code for all robots.

like

PHP Code:
$ip getenv(REMOTE_ADDR);

if (
$ip != a spider crwaler) {

// add to database


__________________

Please login or register to view this content. Registration is FREE
Gilligan is offline
Reply With Quote
View Public Profile
 
Old 04-02-2008, 01:54 PM Re: Spiders / Crawlers
VirtuosiMedia's Avatar
Web Design Made Simple

Posts: 1,228
Trades: 0
Not that I know, but if you find one, I'd love to know. The closest thing you might be able to do is check for the word 'bot'. Even that, though, will let a lot of bots past because there are many that don't have that in their name.
VirtuosiMedia is offline
Reply With Quote
View Public Profile Visit VirtuosiMedia's homepage!
 
Reply     « Reply to Spiders / Crawlers
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.47582 seconds with 12 queries