Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

PHP Forum


You are currently viewing our PHP Forum as a guest. Please register to participate.
Login



Freelance Jobs

Closed Thread
Old 03-17-2011, 03:12 PM CURL Acting Weirdly
evans123's Avatar
Ultra Talker

Posts: 468
Trades: 0
Im using Curl to retrieve links from designated pages. My start address is http://www.newsmuncher.com and when i gather all the links from the page they all point towards the base url http://www.benwebdeveloper.com with a 301 status code. I have a .htaccess setup to transfer all requests from http://www.benwebdeveloper.com to http://www.newsmuncher.com, therefore i shouldnt be getting links to benwebdeveloepr when crawling through the site. Any ideas why Curl does this?

Its almost as if it knows the two links are identical but seeing as it redirect's any requests from benwebdeveloper to newsmuncher, its back tracing its routes. When there aren't actually any links on the page to benwebdeveloper!


PHP Code:
    function get_file($location) {
        
$ch curl_init();
        
curl_setopt($chCURLOPT_URL$location);
        
curl_setopt($chCURLOPT_USERAGENT$this->user_agent);
        
curl_setopt($chCURLOPT_HEADERFALSE);
        
curl_setopt($chCURLOPT_RETURNTRANSFERTRUE);
        
curl_setopt($chCURLOPT_FOLLOWLOCATIONFALSE);
        
        
$data curl_exec($ch);
        
        echo 
curl_getinfo($chCURLINFO_EFFECTIVE_URL);
        
$status_code curl_getinfo($chCURLINFO_HTTP_CODE);
        
$content_type explode(';'curl_getinfo($chCURLINFO_CONTENT_TYPE));
        
$content_type $content_type[0];
        
curl_close($ch); 
return array(
'status_code' => $status_code'content_type' => $content_type'data' => $data);
    }

    function 
extract_links($html) {
        
        
$dom = new DOMDocument();
        @
$dom->loadHTML($html);
        
        
$xpath = new DOMXPath($dom);
        
$hrefs $xpath->evaluate("/html/body//a");
        for (
$i 0$i $hrefs->length$i++) {
            
$href $hrefs->item($i);
            
$url $href->getAttribute('href');
            
$this->add_queue($url);
        }
        echo 
'<p>Links Found: '$hrefs->length '</p>';
    } 
Output From Script

PHP Code:
Crawler Initiated
Links Found
26
bool
(falsehttp://www.benwebdeveloper.com/  301
bool(false#content  0
bool(falsehttp://www.benwebdeveloper.com/  301
bool(falsehttp://www.benwebdeveloper.com/about/  301
bool(falsehttp://www.benwebdeveloper.com/2010/10/hello-world/  301
bool(falsehttp://www.benwebdeveloper.com/2010/10/hello-world/  301
bool(falsehttp://www.benwebdeveloper.com/author/admin/  301
bool(falsehttp://www.benwebdeveloper.com/category/uncategorized/  301
bool(falsehttp://www.benwebdeveloper.com/2010/10/hello-world/#comments  301
bool(falsehttp://themeforest.net/item/circlosquero-premium-wordpress-theme/163014?ref=benwebdeveloper  302
bool(falsehttp://themeforest.net/item/alyeska-premium-wordpress-theme/164366?ref=benwebdeveloper  302
bool(falsehttp://themeforest.net/item/dandelion-powerful-elegant-wordpress-theme/136628?ref=benwebdeveloper  302
bool(falsehttp://themeforest.net/item/king-size-fullscreen-background-wordpress-theme/166299?ref=benwebdeveloper  302
bool(falsehttp://themeforest.net/item/lotus-for-business-software-corporate-portfolio/164748?ref=benwebdeveloper  302
bool(falsehttp://themeforest.net/item/striking-premium-corporate-portfolio-wp-theme/128763?ref=benwebdeveloper  302
bool(falsehttp://www.benwebdeveloper.com/2010/10/hello-world/  301
bool(falsehttp://wordpress.org/  200
bool(falsehttp://www.benwebdeveloper.com/2010/10/hello-world/#comment-1  301
bool(falsehttp://www.benwebdeveloper.com/2010/10/  301
bool(falsehttp://www.benwebdeveloper.com/category/uncategorized/  301
bool(falsehttp://www.newsmuncher.com/wp-login.php  200
bool(falsehttp://www.benwebdeveloper.com/feed/  301
bool(falsehttp://www.benwebdeveloper.com/comments/feed/  301
bool(falsehttp://wordpress.org/  200
bool(falsehttp://www.benwebdeveloper.com/  301
bool(falsehttp://wordpress.org/  200 

Last edited by evans123; 03-17-2011 at 03:14 PM..
evans123 is offline
View Public Profile Visit evans123's homepage!
 
 
Register now for full access!
Old 03-17-2011, 03:15 PM Re: CURL Acting Weirdly
evans123's Avatar
Ultra Talker

Posts: 468
Trades: 0
Found the issue.... Wordpress default was still set as benwebdeveloper. Sorry for posting problem!
evans123 is offline
View Public Profile Visit evans123's homepage!
 
Closed Thread     « Reply to CURL Acting Weirdly
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.84943 seconds with 12 queries