Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

PHP Forum


You are currently viewing our PHP Forum as a guest. Please register to participate.
Login



Freelance Jobs

Reply
Old 01-31-2007, 05:19 AM PHP scraping script
Skilled Talker

Posts: 81
Trades: 0
Hi PHPers

I have a script which scrapes a page, and converts it to javascript, so others can syndicate it:



PHP Code:
<?php

if(!empty($_GET)) extract($_GET);

function 
getPageContent$url ){
    return( 
implodefile$url ), "" ) );
}

function 
parseContentByTagName$content ){
    
preg_match_all'/<div id="anyname">([\n\r\w\W.]*?)<\/div>/i'$content$contentArray );
    return( 
implode$contentArray[0], "" ) );
}

if( 
$pageID == "" )
    print( 
"You have passed an invalid page, please include the parameter: pageID" );
else{

$pageContent getPageContent$pageID );
$pageContent parseContentByTagName$pageContent );


    if( 
$pageContent == "" )
        print( 
"document.write( \"backup message here\" );");
    
    
$pageContent str_replace'"''\"'$pageContent );
    
$pageContent str_replace"\n"" "$pageContent );
    
$pageContent str_replace"\r"" "$pageContent );
      
?>

You will see that it grabs all the content between the starting tag <div id="table"> and the next closing div.

Unfortunately, the page I need to parse contains a number of closing divs WITHIN itself. I had hoped that I could use a different element to define the closing point, eg a closing table tag or something. However, the script only seems to response to a closing div. And it will not, for instance, respond to, say, an opening table tag with id, and closing table tag. Or any other custom tag like that.

It would be very difficult to remove the various divs within the page.


Thanks for any ideas.


Tony
soon is offline
Reply With Quote
View Public Profile Visit soon's homepage!
 
 
Register now for full access!
Reply     « Reply to PHP scraping script
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.07979 seconds with 12 queries