Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

PHP Forum


You are currently viewing our PHP Forum as a guest. Please register to participate.
Login



Freelance Jobs

Reply
How to make a search script?
Old 01-21-2008, 12:16 PM How to make a search script?
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
Hello all,

I would like to have a go at making a simple search engine, i think i could cope with making the part to search the database but i know my biggest issue is creating the bot/crawler to search my site(s) collect all the info etc. and save it in the database.


So anyidea or advice on how to go about getting a page getting the keywords, phrases etc?

TP for all the helpful answers
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
 
Register now for full access!
Old 01-21-2008, 01:19 PM Re: How to make a search script?
rogem002's Avatar
PHP Chap

Posts: 843
Name: Mike
Location: United Kingdom
Trades: 0
Use the function 'get_meta_tags($url)' for getting meta information. For example:
PHP Code:
<?php
// lets get the info from example.com
$tags get_meta_tags('http://www.example.com/');

// there returned in an array.
echo $tags['author'];       // name
echo $tags['keywords'];     // php documentation
echo $tags['description'];  // a php manual
?>
Take a look at php.net for more info ^^
__________________
My Blog/Site:
Please login or register to view this content. Registration is FREE

Last edited by rogem002; 01-21-2008 at 01:25 PM..
rogem002 is offline
Reply With Quote
View Public Profile Visit rogem002's homepage!
 
Old 01-21-2008, 02:20 PM Re: How to make a search script?
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
Thanks for that Mike, will help.

im thinking that other than getting meta info i would catch the page in a object, list image urls, remove html and my big problem is how to go about creating some kind of kewords and phrases thing to have in the database to be searched.

but what would be the best way to do this without ending up with a algo whcih tells me the most important words on a page is "in" as the most important but i dont really want to completely remove the words which hold no relevents, because i would like it so it can be searched by phrases?

or am i overeaching?...
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 01-21-2008, 06:03 PM Re: How to make a search script?
phpknowhow's Avatar
Skilled Talker

Posts: 83
Name: Colin
Location: USA
Trades: 0
I would suggest that you plug all the pages into a database:
id, url, keywords

Then just use something like:
PHP Code:

$page
=1;
$display=20;
$start=($page-1)*$display;

$query="SELECT `url` FROM `search_map` WHERE `keyword` REGEXP '([,]|^){$_GET['c']}([,]|$)'  LIMIT {$start}{$display}"
Hope that helps.
__________________

Please login or register to view this content. Registration is FREE
| Freelance PHP solutions for small to midsized projects |
Please login or register to view this content. Registration is FREE
phpknowhow is offline
Reply With Quote
View Public Profile Visit phpknowhow's homepage!
 
Old 01-22-2008, 07:25 AM Re: How to make a search script?
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
you are refering to the script get the results from the DB right?
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 01-22-2008, 07:27 PM Re: How to make a search script?
phpknowhow's Avatar
Skilled Talker

Posts: 83
Name: Colin
Location: USA
Trades: 0
Yup. Should I explain it more?
__________________

Please login or register to view this content. Registration is FREE
| Freelance PHP solutions for small to midsized projects |
Please login or register to view this content. Registration is FREE
phpknowhow is offline
Reply With Quote
View Public Profile Visit phpknowhow's homepage!
 
Old 01-23-2008, 09:22 AM Re: How to make a search script?
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
no worries, i know how to get the results etc from the database.

just made me confused on how they would be used to add to the db lol!
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 01-23-2008, 09:34 AM Re: How to make a search script?
phpknowhow's Avatar
Skilled Talker

Posts: 83
Name: Colin
Location: USA
Trades: 0
In order to add info to the database, use cURL to go through a sitemap.xml file or something and process every URL and update/insert the information into the database. You can also get by using file_get_contents().
__________________

Please login or register to view this content. Registration is FREE
| Freelance PHP solutions for small to midsized projects |
Please login or register to view this content. Registration is FREE
phpknowhow is offline
Reply With Quote
View Public Profile Visit phpknowhow's homepage!
 
Old 01-23-2008, 09:46 AM Re: How to make a search script?
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
i know how to get the code is more how to make a half decent algo to sort and rank it.
im still clueless on how to have a rank system to show best results,
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 01-23-2008, 06:39 PM Re: How to make a search script?
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
can u explain what the REGEX in that query does?

Also, how can i have the query undersatnd/process bollen oporators? Or - And - minus/NOT

also came across this SQL and this seems to be what i need to use
FROM code WHERE MATCH(title, code) AGAINST ('$keyword')";


Also how would i make it robots.txt friendly.

Also would this work with page.php?id=115 or whatever friendly? or is there something special i would need to make that work.

How can i get the image url from img tags and links?

Thanks.
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 01-23-2008, 08:08 PM Re: How to make a search script?
phpknowhow's Avatar
Skilled Talker

Posts: 83
Name: Colin
Location: USA
Trades: 0
Quote:
Originally Posted by dansgalaxy View Post
can u explain what the REGEX in that query does?
REGEX matches a regular expression in MySQL. I should rewrite the code to be:
PHP Code:
$query="SELECT `url` FROM `search_map` WHERE `keyword` REGEXP '([,]|^){$search_keyword}([,]|$)'  LIMIT {$start}{$display}"
Basically it matches a keyword in the format:
1) keyword, (start of list)
2) ,keyword,
3) ,keyword (end of list)

The list would look like this:
keyword1,keyword2,keyword3,keyword4

So if you typed in "keyword1" as $search_keyword, it would return the row that had "keyword1" somewhere in the list.


Quote:
Originally Posted by dansgalaxy View Post
Also, how can i have the query undersatnd/process bollen oporators? Or - And - minus/NOT
Just use ENUM('0', '1'). (I am assuming by "bollen" you mean boolean, please spell check)


Quote:
Originally Posted by dansgalaxy View Post
FROM code WHERE MATCH(title, code) AGAINST ('$keyword')";
I'm haven't used this syntax/function before.

Quote:
Originally Posted by dansgalaxy View Post
Also how would i make it robots.txt friendly.

Also would this work with page.php?id=115 or whatever friendly? or is there something special i would need to make that work.

How can i get the image url from img tags and links?
Robots.txt just regulate the pages that are visible to certain bots, so it really has no bearing in this. Paging is just a way of dealing with result rows, your problem is getting the result rows to correctly reflect the users' request, so no. Just use preg_match_all in something like this:
PHP Code:
preg_match_all('/img="([^"]+)"/'$source$img_matches);
preg_match_all('/alt="([^"]+)"/'$source$alt_matches); 
In $img_matches[1] you should have a list of all the image urls. Also, in $alt_matches[1] you should have all the image alt values. The indexes should coincide, unless you have an images without alt tags.

If you haven't yet, spend a day or two and learn the preg functions. Regular expressions are very powerful tools.
__________________

Please login or register to view this content. Registration is FREE
| Freelance PHP solutions for small to midsized projects |
Please login or register to view this content. Registration is FREE
phpknowhow is offline
Reply With Quote
View Public Profile Visit phpknowhow's homepage!
 
Old 01-24-2008, 12:04 PM Re: How to make a search script?
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
could i also use that preg match link this?

PHP Code:
preg_match_all('/a href="([^"]+)"/'$source$alt_matches); 
</SPAN>

to get link/urls from my pages so i could follow/index?
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 01-24-2008, 12:27 PM Re: How to make a search script?
phpknowhow's Avatar
Skilled Talker

Posts: 83
Name: Colin
Location: USA
Trades: 0
Yes, but make sure to change the variable $alt_matches to something else. The preg_match_all() function will overwrite your previous alt matches. Just to note, the preg_match_all() functions I have written so far are not case insensitive. You would have to add a i at the end of the regular expression like so:
PHP Code:
preg_match_all('/img="([^"]+)"/i'$source$img_matches);
preg_match_all('/alt="([^"]+)"/i'$source$alt_matches); 
preg_match_all('/a href="([^"]+)"/i'$source$url_matches); 
That way you can match A HREF="http://url.com/" as well as a href="http://url.com/".
__________________

Please login or register to view this content. Registration is FREE
| Freelance PHP solutions for small to midsized projects |
Please login or register to view this content. Registration is FREE
phpknowhow is offline
Reply With Quote
View Public Profile Visit phpknowhow's homepage!
 
Old 01-24-2008, 03:40 PM Re: How to make a search script?
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
what kind of look would i need to be able to retrieve the urls from the varable from the preg_match_all?>?
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 01-24-2008, 03:53 PM Re: How to make a search script?
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
hi im having trouble looping the result of the preg match to get the urls

what have on my bot script so far:
PHP Code:
include('includes/connect.php');
$url 'http://dansgalaxy.co.uk/index.php';
$page file_get_contents($url);

// lets get the info from example.com
$tags get_meta_tags($url);
// there returned in an array.
echo $tags['author'].'<br /><br />';       // name
echo $tags['keywords'].'<br /><br />';     // php documentation
echo $tags['description'].'<br /><br />';  // a php manual
/*preg_match_all('/img="([^"]+)"/i', $source, $img_matches);
preg_match_all('/alt="([^"]+)"/i', $source, $alt_matches); */
preg_match_all('/a href="([^"]+)"/i'$page$urls_to_index); 
for (
$i 0$i 10$i++)
     {
       echo 
$urls_to_index[$i];
  } 
at the moment im just getting to echo the info i will use.
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 01-24-2008, 04:12 PM Re: How to make a search script?
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
okay i have it working now but theres some problems.

Firstly why does it make the array like this:
Array ( [0] => Array ( [0] => a href="http://ipchicken.dansgalaxy.co.uk" [1] => a href="index.php?id=tompoem" [2] => a href="index.php?id=cpanel" [3] => a href="index.php?id=contact" [4] => a href="http://dansgalaxy.co.uk/ipme" [5] => a href="http://dansgalaxy.co.uk/filehost" [6] => a href="http://dansgalaxy.co.uk/youtube_download" [7] => a href="http://validator.w3.org/check?uri=referer" [8] => a href="http://jigsaw.w3.org/css-validator/validator?uri=http%3A%2F%2Fdansgalaxy.co.uk&warnin g=1&profile=css21&usermedium=all" ) [1] => Array ( [0] => http://ipchicken.dansgalaxy.co.uk [1] => index.php?id=tompoem [2] => index.php?id=cpanel [3] => index.php?id=contact [4] => http://dansgalaxy.co.uk/ipme [5] => http://dansgalaxy.co.uk/filehost [6] => http://dansgalaxy.co.uk/youtube_download [7] => http://validator.w3.org/check?uri=referer [8] => http://jigsaw.w3.org/css-validator/v...usermedium=all ) )

so its two identical arrays withing another?

also when i got it listng the urls retrived.
they look liek this:
a href="http://ipchicken.dansgalaxy.co.uk"
a href="index.php?id=tompoem"
a href="index.php?id=cpanel"
a href="index.php?id=contact"
a href="http://dansgalaxy.co.uk/ipme"
a href="http://dansgalaxy.co.uk/filehost"
a href="http://dansgalaxy.co.uk/youtube_download"
a href="http://validator.w3.org/check?uri=referer"
a href=http://jigsaw.w3.org/css-validator/v...usermedium=all

how do i remove the a href=" " ?
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 01-24-2008, 04:35 PM Re: How to make a search script?
phpknowhow's Avatar
Skilled Talker

Posts: 83
Name: Colin
Location: USA
Trades: 0
Again, you really just need to read the manual for the preg functions. The preg_match function returns the whole regular expression match in $match[0] and sub-matches in the other indexes. Therefore, since the url is actually a sub-match (the first sub-match), you would access it with $match[1]. Since we are using preg_match_all, $match[1] will be an array of matches. Therefore to increment through the matches use:
PHP Code:
foreach($urls_to_index[1] as $url) {
  
// do something

Please look at the manual:
http://us3.php.net/manual/en/function.preg-match.php
http://us3.php.net/manual/en/functio...-match-all.php
http://us3.php.net/manual/en/referen...ern.syntax.php

This is boiling down to some really basic stuff, that I would rather not explain because the manual does a very good job.
__________________

Please login or register to view this content. Registration is FREE
| Freelance PHP solutions for small to midsized projects |
Please login or register to view this content. Registration is FREE
phpknowhow is offline
Reply With Quote
View Public Profile Visit phpknowhow's homepage!
 
Reply     « Reply to How to make a search script?
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.99840 seconds with 12 queries