Hi PHPers
I have a script which scrapes a page, and converts it to javascript, so others can syndicate it:
PHP Code:
<?php
if(!empty($_GET)) extract($_GET);
function getPageContent( $url ){ return( implode( file( $url ), "" ) ); }
function parseContentByTagName( $content ){ preg_match_all( '/<div id="anyname">([\n\r\w\W.]*?)<\/div>/i', $content, $contentArray ); return( implode( $contentArray[0], "" ) ); }
if( $pageID == "" ) print( "You have passed an invalid page, please include the parameter: pageID" ); else{
$pageContent = getPageContent( $pageID ); $pageContent = parseContentByTagName( $pageContent );
if( $pageContent == "" ) print( "document.write( \"backup message here\" );"); $pageContent = str_replace( '"', '\"', $pageContent ); $pageContent = str_replace( "\n", " ", $pageContent ); $pageContent = str_replace( "\r", " ", $pageContent ); ?>
You will see that it grabs all the content between the starting tag <div id="table"> and the next closing div.
Unfortunately, the page I need to parse contains a number of closing divs WITHIN itself. I had hoped that I could use a different element to define the closing point, eg a closing table tag or something. However, the script only seems to response to a closing div. And it will not, for instance, respond to, say, an opening table tag with id, and closing table tag. Or any other custom tag like that.
It would be very difficult to remove the various divs within the page.
Thanks for any ideas.
Tony
|