Code:
<?php
// author: Mark Ibbotson
// email :
error_reporting(E_COMPILE_ERROR|E_ERROR|E_CORE_ERROR);
class Search{
var $_domain = "";
var $_keyword = "";
var $_links = Array();
var $_matched = Array();
var $_usefull=Array();
function Search(){
}
function openSite(){
// opens DOMAIN and scans the home page for potential site links
// looking for short links I.E "/\/[a-z0-9]*/i" /support and not www.DOMAIN.com/support
if(isset($this->_domain)){
$page = file($this->_domain);
foreach($page as $val){
if(preg_match("/\/[a-z0-9]*/i", $val, $m)){
// add matched short links to array for future consideration
array_push($this->_links, $m[0]);
}elseif(preg_match("/\/[a-z0-9]*\/[a-z0-9]*/i", $val, $mn)){
array_push($this->_links, $mn[0]);
}
}
}
}
function findPageOnSearch(){
// provide a list of links of matched pages and search terms
foreach($this->_links as $key=>$val){
// remove duplicates from top level link array
if(!in_array($val,$this->_usefull)){
// usefull, unique list of short links
array_push($this->_usefull, $val);
}
}
// lets make sure we dont parse any pages that dont require parsing.
// css directory, image directory etc
foreach($this->_usefull as $key=>$val){
if($val != "/css" && $val != "/js" && $val != "/images"){
// if none of the above pages grab the current page
$page = file_get_contents($this->_domain . $val);
// can we match our keyword in that page
if(strstr($page, $this->_keyword)){
if($val == "desktop") echo $val;
// if so lets display that match and a link to the page
echo "MATCHED : $this->_keyword<hr />";
echo "<a href=" . $this->_domain . $val .">" . $this->_domain.$val . "</a> <br />";
echo "<hr />";
}
}
}
}
}
// ensure POST is set before kick off
if(!empty($_POST)){
if(isset($_POST['domain']) && isset($_POST['keyword'])){
// new search obj.
$search = new Search();
// search vars
$search->_domain = $_POST['domain'];
$search->_keyword = $_POST['keyword'];
// dog work methods
$search->openSite();
$search->findPageOnSearch();
}
}
?>
<html>
<head>
<title>Site search</title>
<body>
<form method="post" action="search.php">
<table border="0">
<tr><td>Domain</td><td><input type="text" name="domain" value="http://domain.com" /></td></tr>
<tr><td>Keyword</td><td><input type="text" name="keyword" value="" /></td></tr>
<tr><td></td><td><input type="submit" value="Go" /></td></tr>
</table>
</form>
</body>
</html>
This is in the prtotype stage and its CASE SENSATIVE (add str_tolower where appropriate if you use it).
It works like thus:
Enter domain name ( http://) included.
The script then goes away and preg's all /[a-z0-9]* looking for short links like href="/support" which it then adds to $_links.
$_links is then polled and all duplicates are removed leaving us with $_usefull.
Poll $_usefull and open each page it referes to. A simple strstr on that page to match your keyword search and hey presto it lists linbks with your keyword.
Of course its far from been a google worthy search engine (and it seriously lacks features bar matching keywords).
Anyway its there for you all to chop to bits improve upon and use at your leisure.
Any mods or ideas to improve it would also be welcome.
Hope it helps some of you, it certainly is usefull to me.
Ibbo
Last edited by ibbo; 06-27-2006 at 08:13 AM..
|