 |
|
|
04-07-2010, 04:42 PM
|
PHP help, please
|
Posts: 1,788
Name: John
Location: USA
|
I have developed a broken link checker that works great, unless the URLs don't have the base href in them.
For example:
If the links are ...href="http://www.somesite.com/somepage.html"... is works great
But if they are ...href="somepage.html"..., ...href="/somepage.html"..., or...href="./somepage.html"... it ignors them
Here's the problem code:
Code:
$matches = array();
preg_match_all("|href\=\"?'?`?([[:alnum:]:?=&@/;._-]+)\"?'?`?|i", $html, $matches);
$links = array();
$ret = $matches[1];
for($i=0;isset($ret[$i]);$i++) {
if(preg_match("|^http://(.*)|i", $ret[ $i])) {
$links[] = $ret[$i];
} elseif(preg_match("|^(.*)|i", $ret[$i])) {
$links[] = "http://".$info["host"]."". $ret[$i];
}
}
return $links;
}
I thought
} elseif(preg_match("|^(.*)|i", $ret[$i])) {
$links[] = "http://".$info["host"]."". $ret[$i];
would have taken care if it. Please help!
Many Thanks.
|
|
|
|
04-07-2010, 05:04 PM
|
Re: PHP help, please
|
Posts: 1,618
Location: UK
|
you could first maybe do a str_replace and remove ALL http:// 's
Then do a str_replace for href="
and replace with href="http://
?
Ie: remove all cases of http://
so NOTHING has it.
Then add it to EVERYTHING.
for example,
Replace href="http:// with href="
Replace href="/ with href="
Replace href="./ with href="
Then replace href=" with href="http://
Last edited by lynxus; 04-07-2010 at 05:06 PM..
|
|
|
|
04-07-2010, 05:15 PM
|
Re: PHP help, please
|
Posts: 1,788
Name: John
Location: USA
|
Quote:
Originally Posted by lynxus
you could first maybe do a str_replace and remove ALL http:// 's
Then do a str_replace for href="
and replace with href="http://
?
Ie: remove all cases of http://
so NOTHING has it.
Then add it to EVERYTHING.
for example,
Replace href="http:// with href="
Replace href="/ with href="
Replace href="./ with href="
Then replace href=" with href="http://
|
This would then give me href="http://somepage.html and if I use $info["host"] after http:// I'll have issues with external links, etc.
I believe the solution is within the regex:
if(preg_match("|^http://(.*)|i", $ret[ $i])) {
$links[] = $ret[$i];
HERE ----> } elseif(preg_match("|^(.*)|i", $ret[$i])) {
$links[] = "http://".$info["host"]."". $ret[$i];
}
}
|
|
|
|
04-08-2010, 06:07 PM
|
Re: PHP help, please
|
Posts: 232
Name: John
Location: Tokyo
|
why do you check again on url contents in
PHP Code:
} elseif(preg_match("|^(.*)|i", $ret[$i])) { $links[] = "http://".$info["host"]."". $ret[$i]; }
try
PHP Code:
} else { $links[] = "http://".$info["host"]."". $ret[$i]; }
|
|
|
|
04-09-2010, 09:33 AM
|
Re: PHP help, please
|
Posts: 1,788
Name: John
Location: USA
|
Quote:
Originally Posted by nayes84
why do you check again on url contents in
PHP Code:
} elseif(preg_match("|^(.*)|i", $ret[$i])) {
$links[] = "http://".$info["host"]."". $ret[$i];
}
try
PHP Code:
} else {
$links[] = "http://".$info["host"]."". $ret[$i];
}
|
Thanks. I'm a vbScript man learning PHP, so this is why. I'll try this.
The original issue is no longer a problem. The problem now is I cannot get the server flush() the results to the browser as the script runs. ob_start(), ob_flush(), ect, doesn't work, and I know safe mode is not on. Because of this, the server and browser don't communicate enough. So when it sends the results to the browser, it stops rendering the page halfway. There are no errors, I set the timout limit to "0", ect. I have searched a lot of possible answers and fixes to no avail.
|
|
|
|
04-09-2010, 12:35 PM
|
Re: PHP help, please
|
Posts: 232
Name: John
Location: Tokyo
|
Could be not a problem with flush but problem with infinite loop or something. Probably you know, if you have mistaken loop condition, it could result in infinite loop.
|
|
|
|
04-09-2010, 03:17 PM
|
Re: PHP help, please
|
Posts: 1,788
Name: John
Location: USA
|
Quote:
Originally Posted by nayes84
Could be not a problem with flush but problem with infinite loop or something. Probably you know, if you have mistaken loop condition, it could result in infinite loop.
|
No, I don't believe any results would be returned if there was an infanite loop issue.
I know there is more code that can be reduced (redundant). Do you mind taking a close look in this section?:
Code:
// Gets Unique Urls
function GetUniqueUrls(&$html, &$url ) {
if ( !$html ){
return false;
}
// Gets the list of urls
$urls = GetUrls ( $html, $url );
$uurls = array();
for( $i=0;isset($urls[$i]);$i++ ) {
// Checks if the url is in the array
if(!in_array($urls[$i], $uurls )) {
// If it's not it adds it
$uurls[] = $urls[$i];
}
}
return $uurls;
}
// Gets everything headers and HTML
function geteverything(&$url) {
// Gets url ready to use
$info = @parse_url( $url );
// Opens socket
$fp = @fsockopen( $info["host"], 80, $errno, $errstr, 10 );
// Makes sure the socket is open or returns false
if ( !$fp ) {
return false ;
} else {
// Checks the path is not empty
if( empty( $info["path"] ) ) {
// If it is empty it fills it
$info["path"] = "/";
}
$query = "";
// Checks if there is a query string in the url
if( isset( $info["query"] ) ) {
// If there is a query string it adds a ? to the front of it
$query = "?".$info["query"]."";
}
// Sets the headers to send
$out = "GET ".$info["path"]."".$query." HTTP/1.0\r\n";
$out .= "Host: ".$info["host"]."\r\n";
$out .= "Connection: close \r\n";
$out .= "User-Agent: link_checker/1.1\r\n\r\n";
// writes the headers out
fwrite( $fp, $out );
$html = '';
// Reads what gets sent back
set_time_limit(0);
while ( !feof( $fp ) ) {
$html .= fread( $fp, 8192 );
}
// Closes socket
fclose( $fp );
}
return $html;
}
Thank you for all your help!
Last edited by kline11; 04-09-2010 at 03:42 PM..
Reason: mistake pasting code
|
|
|
|
04-09-2010, 05:30 PM
|
Re: PHP help, please
|
Posts: 232
Name: John
Location: Tokyo
|
I checked the code and it is working.
see here
http://www.ephotobay.com/test2.php
Is it showing partial results or not showing any thing at all? can you show me it working on your website?
|
|
|
|
04-10-2010, 03:39 PM
|
Re: PHP help, please
|
Posts: 1,788
Name: John
Location: USA
|
Quote:
Originally Posted by nayes84
|
OK, I checked your site and it was all loading, but more links seem to be a problem.
I do need to speed it up and add a progress bar, or something while users wait, if I can get ob_flush() and flush() to work. Can I pm you the full source code to see if there is anything I can do to speed things up? Like I said, it's been VBscript for me until recently, so I know I have some useless code. I don't care if you want to use the code for your own purposes. Is it a deal?
Thanks.
|
|
|
|
|
« Reply to PHP help, please
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|