the tables and how they are structured isn't really important, and the data isn't really a concern. It's all about the data is indexed into the structure
In my example structure, the inverted index would be built as the word_pos table and would be the word_id, doc_id and the position.
A simple (relatively speaking  ) explanation;
Quote:
from http://www.nist.gov/dads/HTML/invertedIndex.html
Note: Suppose we want to search the texts "i love you," "god is love," "love is blind," and "blind justice." (The words of the text are all lower case for simplicity.) If we index by (text, character within the text), the index with location in text is:
blind (3,8);(4,0)
god (2,0)
i (1,0)
is (2,4);(3,5)
justice (4,6)
love (1,2);(2,7);(3,0)
you (1,7)
The word "blind" is in document 3 ("love is blind") starting at character 8, so has an entry (3,8). To find, for instance, documents with both "is" and "love," first look up the words in the index, then find the intersection of the texts in each list. In this case, documents 2 and 3 have both words. We can quickly find documents where the words appear close to each other by comparing the character within the text.
|
You can see from there how simple it is to query the word_pos table and retrieve all documents pertaining to a word. If you use tag weighting you also has the basis of a simple ranking algorithm.
eg;
you SUM the occurences of that word within any given document = document_rank_weight
then if;
the word was found in the title, add extra rank weight
the word was found in bold, headings, italics etc add extra rank weight for each
then of course if you have an index where the word is used in links that point to a particular document from other locations you can use that to up the document ranking weight as well.
This is of course a very simple ranking method and one that is easily subverted, so don't think you can take on Google with it 
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
|