How to make a search script?
01-21-2008, 12:16 PM
|
How to make a search script?
|
Posts: 6,521
Name: Dan
Location: Swindon
|
Hello all,
I would like to have a go at making a simple search engine, i think i could cope with making the part to search the database but i know my biggest issue is creating the bot/crawler to search my site(s) collect all the info etc. and save it in the database.
So anyidea or advice on how to go about getting a page getting the keywords, phrases etc?
TP for all the helpful answers 
__________________
Discounted Web Hosting With XDnet! >> Get 25% of hosting~ Promo: Webmaster-talk <<
|
|
|
|
01-21-2008, 01:19 PM
|
Re: How to make a search script?
|
Posts: 843
Name: Mike
Location: United Kingdom
|
Use the function 'get_meta_tags($url)' for getting meta information. For example:
PHP Code:
<?php // lets get the info from example.com $tags = get_meta_tags('http://www.example.com/');
// there returned in an array. echo $tags['author']; // name echo $tags['keywords']; // php documentation echo $tags['description']; // a php manual ?>
Take a look at php.net for more info ^^
__________________
My Blog/Site: Please login or register to view this content. Registration is FREE
Last edited by rogem002; 01-21-2008 at 01:25 PM..
|
|
|
|
01-21-2008, 02:20 PM
|
Re: How to make a search script?
|
Posts: 6,521
Name: Dan
Location: Swindon
|
Thanks for that Mike, will help.
im thinking that other than getting meta info i would catch the page in a object, list image urls, remove html and my big problem is how to go about creating some kind of kewords and phrases thing to have in the database to be searched.
but what would be the best way to do this without ending up with a algo whcih tells me the most important words on a page is "in" as the most important but i dont really want to completely remove the words which hold no relevents, because i would like it so it can be searched by phrases?
or am i overeaching?...
__________________
Discounted Web Hosting With XDnet! >> Get 25% of hosting~ Promo: Webmaster-talk <<
|
|
|
|
01-21-2008, 06:03 PM
|
Re: How to make a search script?
|
Posts: 83
Name: Colin
Location: USA
|
I would suggest that you plug all the pages into a database:
id, url, keywords
Then just use something like:
PHP Code:
$page=1; $display=20; $start=($page-1)*$display;
$query="SELECT `url` FROM `search_map` WHERE `keyword` REGEXP '([,]|^){$_GET['c']}([,]|$)' LIMIT {$start}, {$display}";
Hope that helps.
__________________
Please login or register to view this content. Registration is FREE | Freelance PHP solutions for small to midsized projects | Please login or register to view this content. Registration is FREE
|
|
|
|
01-22-2008, 07:25 AM
|
Re: How to make a search script?
|
Posts: 6,521
Name: Dan
Location: Swindon
|
you are refering to the script get the results from the DB right?
__________________
Discounted Web Hosting With XDnet! >> Get 25% of hosting~ Promo: Webmaster-talk <<
|
|
|
|
01-22-2008, 07:27 PM
|
Re: How to make a search script?
|
Posts: 83
Name: Colin
Location: USA
|
Yup. Should I explain it more?
__________________
Please login or register to view this content. Registration is FREE | Freelance PHP solutions for small to midsized projects | Please login or register to view this content. Registration is FREE
|
|
|
|
01-23-2008, 09:22 AM
|
Re: How to make a search script?
|
Posts: 6,521
Name: Dan
Location: Swindon
|
no worries, i know how to get the results etc from the database.
just made me confused on how they would be used to add to the db lol!
__________________
Discounted Web Hosting With XDnet! >> Get 25% of hosting~ Promo: Webmaster-talk <<
|
|
|
|
01-23-2008, 09:34 AM
|
Re: How to make a search script?
|
Posts: 83
Name: Colin
Location: USA
|
In order to add info to the database, use cURL to go through a sitemap.xml file or something and process every URL and update/insert the information into the database. You can also get by using file_get_contents().
__________________
Please login or register to view this content. Registration is FREE | Freelance PHP solutions for small to midsized projects | Please login or register to view this content. Registration is FREE
|
|
|
|
01-23-2008, 09:46 AM
|
Re: How to make a search script?
|
Posts: 6,521
Name: Dan
Location: Swindon
|
i know how to get the code is more how to make a half decent algo to sort and rank it.
im still clueless on how to have a rank system to show best results,
__________________
Discounted Web Hosting With XDnet! >> Get 25% of hosting~ Promo: Webmaster-talk <<
|
|
|
|
01-23-2008, 06:39 PM
|
Re: How to make a search script?
|
Posts: 6,521
Name: Dan
Location: Swindon
|
can u explain what the REGEX in that query does?
Also, how can i have the query undersatnd/process bollen oporators? Or - And - minus/NOT
also came across this SQL and this seems to be what i need to use
FROM code WHERE MATCH(title, code) AGAINST ('$keyword')";
Also how would i make it robots.txt friendly.
Also would this work with page.php?id=115 or whatever friendly? or is there something special i would need to make that work.
How can i get the image url from img tags and links?
Thanks.
__________________
Discounted Web Hosting With XDnet! >> Get 25% of hosting~ Promo: Webmaster-talk <<
|
|
|
|
01-23-2008, 08:08 PM
|
Re: How to make a search script?
|
Posts: 83
Name: Colin
Location: USA
|
Quote:
Originally Posted by dansgalaxy
can u explain what the REGEX in that query does?
|
REGEX matches a regular expression in MySQL. I should rewrite the code to be:
PHP Code:
$query="SELECT `url` FROM `search_map` WHERE `keyword` REGEXP '([,]|^){$search_keyword}([,]|$)' LIMIT {$start}, {$display}";
Basically it matches a keyword in the format:
1) keyword, (start of list)
2) ,keyword,
3) ,keyword (end of list)
The list would look like this:
keyword1,keyword2,keyword3,keyword4
So if you typed in "keyword1" as $search_keyword, it would return the row that had "keyword1" somewhere in the list.
Quote:
Originally Posted by dansgalaxy
Also, how can i have the query undersatnd/process bollen oporators? Or - And - minus/NOT
|
Just use ENUM('0', '1'). (I am assuming by "bollen" you mean boolean, please spell check)
Quote:
Originally Posted by dansgalaxy
FROM code WHERE MATCH(title, code) AGAINST ('$keyword')";
|
I'm haven't used this syntax/function before.
Quote:
Originally Posted by dansgalaxy
Also how would i make it robots.txt friendly.
Also would this work with page.php?id=115 or whatever friendly? or is there something special i would need to make that work.
How can i get the image url from img tags and links?
|
Robots.txt just regulate the pages that are visible to certain bots, so it really has no bearing in this. Paging is just a way of dealing with result rows, your problem is getting the result rows to correctly reflect the users' request, so no. Just use preg_match_all in something like this:
PHP Code:
preg_match_all('/img="([^"]+)"/', $source, $img_matches); preg_match_all('/alt="([^"]+)"/', $source, $alt_matches);
In $img_matches[1] you should have a list of all the image urls. Also, in $alt_matches[1] you should have all the image alt values. The indexes should coincide, unless you have an images without alt tags.
If you haven't yet, spend a day or two and learn the preg functions. Regular expressions are very powerful tools.
__________________
Please login or register to view this content. Registration is FREE | Freelance PHP solutions for small to midsized projects | Please login or register to view this content. Registration is FREE
|
|
|
|
01-24-2008, 12:04 PM
|
Re: How to make a search script?
|
Posts: 6,521
Name: Dan
Location: Swindon
|
could i also use that preg match link this?
PHP Code:
preg_match_all('/a href="([^"]+)"/', $source, $alt_matches);
</SPAN>
to get link/urls from my pages so i could follow/index?
__________________
Discounted Web Hosting With XDnet! >> Get 25% of hosting~ Promo: Webmaster-talk <<
|
|
|
|
01-24-2008, 12:27 PM
|
Re: How to make a search script?
|
Posts: 83
Name: Colin
Location: USA
|
Yes, but make sure to change the variable $alt_matches to something else. The preg_match_all() function will overwrite your previous alt matches. Just to note, the preg_match_all() functions I have written so far are not case insensitive. You would have to add a i at the end of the regular expression like so:
PHP Code:
preg_match_all('/img="([^"]+)"/i', $source, $img_matches); preg_match_all('/alt="([^"]+)"/i', $source, $alt_matches); preg_match_all('/a href="([^"]+)"/i', $source, $url_matches);
That way you can match A HREF="http://url.com/" as well as a href="http://url.com/".
__________________
Please login or register to view this content. Registration is FREE | Freelance PHP solutions for small to midsized projects | Please login or register to view this content. Registration is FREE
|
|
|
|
01-24-2008, 03:40 PM
|
Re: How to make a search script?
|
Posts: 6,521
Name: Dan
Location: Swindon
|
what kind of look would i need to be able to retrieve the urls from the varable from the preg_match_all?>?
__________________
Discounted Web Hosting With XDnet! >> Get 25% of hosting~ Promo: Webmaster-talk <<
|
|
|
|
01-24-2008, 03:53 PM
|
Re: How to make a search script?
|
Posts: 6,521
Name: Dan
Location: Swindon
|
hi im having trouble looping the result of the preg match to get the urls
what have on my bot script so far:
PHP Code:
include('includes/connect.php'); $url = 'http://dansgalaxy.co.uk/index.php'; $page = file_get_contents($url);
// lets get the info from example.com $tags = get_meta_tags($url); // there returned in an array. echo $tags['author'].'<br /><br />'; // name echo $tags['keywords'].'<br /><br />'; // php documentation echo $tags['description'].'<br /><br />'; // a php manual /*preg_match_all('/img="([^"]+)"/i', $source, $img_matches); preg_match_all('/alt="([^"]+)"/i', $source, $alt_matches); */ preg_match_all('/a href="([^"]+)"/i', $page, $urls_to_index); for ($i = 0; $i < 10; $i++) { echo $urls_to_index[$i]; }
at the moment im just getting to echo the info i will use.
__________________
Discounted Web Hosting With XDnet! >> Get 25% of hosting~ Promo: Webmaster-talk <<
|
|
|
|
01-24-2008, 04:12 PM
|
Re: How to make a search script?
|
Posts: 6,521
Name: Dan
Location: Swindon
|
okay i have it working now but theres some problems.
Firstly why does it make the array like this:
Array ( [0] => Array ( [0] => a href="http://ipchicken.dansgalaxy.co.uk" [1] => a href="index.php?id=tompoem" [2] => a href="index.php?id=cpanel" [3] => a href="index.php?id=contact" [4] => a href="http://dansgalaxy.co.uk/ipme" [5] => a href="http://dansgalaxy.co.uk/filehost" [6] => a href="http://dansgalaxy.co.uk/youtube_download" [7] => a href="http://validator.w3.org/check?uri=referer" [8] => a href="http://jigsaw.w3.org/css-validator/validator?uri=http%3A%2F%2Fdansgalaxy.co.uk&warnin g=1&profile=css21&usermedium=all" ) [1] => Array ( [0] => http://ipchicken.dansgalaxy.co.uk [1] => index.php?id=tompoem [2] => index.php?id=cpanel [3] => index.php?id=contact [4] => http://dansgalaxy.co.uk/ipme [5] => http://dansgalaxy.co.uk/filehost [6] => http://dansgalaxy.co.uk/youtube_download [7] => http://validator.w3.org/check?uri=referer [8] => http://jigsaw.w3.org/css-validator/v...usermedium=all ) )
so its two identical arrays withing another?
also when i got it listng the urls retrived.
they look liek this:
a href="http://ipchicken.dansgalaxy.co.uk"
a href="index.php?id=tompoem"
a href="index.php?id=cpanel"
a href="index.php?id=contact"
a href="http://dansgalaxy.co.uk/ipme"
a href="http://dansgalaxy.co.uk/filehost"
a href="http://dansgalaxy.co.uk/youtube_download"
a href="http://validator.w3.org/check?uri=referer"
a href= http://jigsaw.w3.org/css-validator/v...usermedium=all
how do i remove the a href=" " ?
__________________
Discounted Web Hosting With XDnet! >> Get 25% of hosting~ Promo: Webmaster-talk <<
|
|
|
|
01-24-2008, 04:35 PM
|
Re: How to make a search script?
|
Posts: 83
Name: Colin
Location: USA
|
Again, you really just need to read the manual for the preg functions. The preg_match function returns the whole regular expression match in $match[0] and sub-matches in the other indexes. Therefore, since the url is actually a sub-match (the first sub-match), you would access it with $match[1]. Since we are using preg_match_all, $match[1] will be an array of matches. Therefore to increment through the matches use:
PHP Code:
foreach($urls_to_index[1] as $url) { // do something }
Please look at the manual:
http://us3.php.net/manual/en/function.preg-match.php
http://us3.php.net/manual/en/functio...-match-all.php
http://us3.php.net/manual/en/referen...ern.syntax.php
This is boiling down to some really basic stuff, that I would rather not explain because the manual does a very good job.
__________________
Please login or register to view this content. Registration is FREE | Freelance PHP solutions for small to midsized projects | Please login or register to view this content. Registration is FREE
|
|
|
|
|
« Reply to How to make a search script?
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|