Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

The Database Forum


You are currently viewing our The Database Forum as a guest. Please register to participate.
Login



Reply
Old 04-22-2006, 09:49 PM An Inverted Index
Skilled Talker

Latest Blog Post:
New Operating System
Posts: 79
Trades: 0
Hello,

I'm curious to know how to make an inverted index.

Thanks.
__________________

Please login or register to view this content. Registration is FREE
WinSrev is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 04-23-2006, 06:41 AM Re: An Inverted Index
chrishirst's Avatar
Missing! presumed drunk.

Posts: 41,517
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
White papers at
http://www.n3labs.com/pdf/putz91usin...rted-DBMS.html
and
http://staff.science.uva.nl/~christo...clean-w-05.pdf (PDF)

One way to simulate one is with three tables (built one (long time ago) for an Intranet text document search)

Quote:
document;
id
doc_location (URL or UNC path to file)
content (if document caching needed)

word_list;
id
word

word_pos;
id
word_id (word_list.id)
doc_id (document.id)
keyword (boolean for if the word is to be included as a keyword for the document)
position (location of word in document if ranking/sorting based on position)
tag (this can be used if you are using a ranking algo for HTML based on tag/structure weighting)
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
chrishirst is online now
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Old 04-24-2006, 02:07 PM Re: An Inverted Index
Skilled Talker

Latest Blog Post:
New Operating System
Posts: 79
Trades: 0
So, really what its saying is, its not the table structure that has to change but the data that goes into it?
__________________

Please login or register to view this content. Registration is FREE
WinSrev is offline
Reply With Quote
View Public Profile
 
Old 04-24-2006, 05:52 PM Re: An Inverted Index
chrishirst's Avatar
Missing! presumed drunk.

Posts: 41,517
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
the tables and how they are structured isn't really important, and the data isn't really a concern. It's all about the data is indexed into the structure

In my example structure, the inverted index would be built as the word_pos table and would be the word_id, doc_id and the position.

A simple (relatively speaking ) explanation;
Quote:
from http://www.nist.gov/dads/HTML/invertedIndex.html

Note: Suppose we want to search the texts "i love you," "god is love," "love is blind," and "blind justice." (The words of the text are all lower case for simplicity.) If we index by (text, character within the text), the index with location in text is:

blind (3,8);(4,0)
god (2,0)
i (1,0)
is (2,4);(3,5)
justice (4,6)
love (1,2);(2,7);(3,0)
you (1,7)

The word "blind" is in document 3 ("love is blind") starting at character 8, so has an entry (3,8). To find, for instance, documents with both "is" and "love," first look up the words in the index, then find the intersection of the texts in each list. In this case, documents 2 and 3 have both words. We can quickly find documents where the words appear close to each other by comparing the character within the text.
You can see from there how simple it is to query the word_pos table and retrieve all documents pertaining to a word. If you use tag weighting you also has the basis of a simple ranking algorithm.

eg;
you SUM the occurences of that word within any given document = document_rank_weight

then if;
the word was found in the title, add extra rank weight
the word was found in bold, headings, italics etc add extra rank weight for each

then of course if you have an index where the word is used in links that point to a particular document from other locations you can use that to up the document ranking weight as well.

This is of course a very simple ranking method and one that is easily subverted, so don't think you can take on Google with it
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
chrishirst is online now
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Old 04-26-2006, 02:40 PM Re: An Inverted Index
Skilled Talker

Latest Blog Post:
New Operating System
Posts: 79
Trades: 0
They all say how it works but thats not what i'm after, i'm after how to do it.
__________________

Please login or register to view this content. Registration is FREE
WinSrev is offline
Reply With Quote
View Public Profile
 
Old 04-26-2006, 03:36 PM Re: An Inverted Index
chrishirst's Avatar
Missing! presumed drunk.

Posts: 41,517
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
What programming skills do you have?
What context do you want to use it (webclient, thin client, desktop app)
Are you considering a full inverted index (monolithic structure) or a simulated one using a RDBMS ?

All the information you need is around to build one, it's simply how you interpret it for your needs. There is no "correct" way to build the structure and query system, it will be purely up to how you need it to work.
Bearing in mind it is not a task for a newbie programmer or a semi skilled chancer who thinks they know databases.
get it wrong and it will perform like a "three legged pig"
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
chrishirst is online now
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Old 04-27-2006, 12:10 PM Re: An Inverted Index
Skilled Talker

Latest Blog Post:
New Operating System
Posts: 79
Trades: 0
I have intermediate programming skills although my friend who helps me is an expert, the context is a search engine the main purpose of it, no idea about the 3rd one, maybe a full inverted index?
__________________

Please login or register to view this content. Registration is FREE
WinSrev is offline
Reply With Quote
View Public Profile
 
Reply     « Reply to An Inverted Index
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.77076 seconds with 12 queries