Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

PHP Forum


You are currently viewing our PHP Forum as a guest. Please register to participate.
Login



Freelance Jobs

Reply
Bayesian Theorm - Content Categorizing
Old 06-10-2011, 12:58 PM Bayesian Theorm - Content Categorizing
evans123's Avatar
Ultra Talker

Posts: 468
Trades: 0
I have built a script that categorizes content based upon a training set using the bayesian theorm and ngrams etc. Im having issues when trying to compare the ngrams to categorize the content as the symbols in the database get reaplced by their entities (UTF-8), i.e. £ and £.

Im just wondering if anyone has built a similar type of script using the bayesian theorm and has any ideas on whats the best way to get around this? Should i consider removing all non-alphanumeric characters and just forgot?

Any help would be appreciated.
evans123 is offline
Reply With Quote
View Public Profile Visit evans123's homepage!
 
 
Register now for full access!
Old 06-10-2011, 01:43 PM Re: Bayesian Theorm - Content Categorizing
mgraphic's Avatar
Truth Seeker

Latest Blog Post:
JAMISONTUNES
Posts: 2,918
Name: Keith Marshall
Location: Connecticut
Trades: 0
Using html_entity_decode() won't work?
__________________

<mgraphic /> - I don't have a solution but I admire the problem.
mgraphic is offline
Reply With Quote
View Public Profile
 
Old 06-10-2011, 02:19 PM Re: Bayesian Theorm - Content Categorizing
Super Spam Talker

Posts: 879
Name: Paul W
Trades: 0
http://dev.mysql.com/doc/refman/5.0/en/charset.html assuming you're using MySQL - most rdbms's have at least these facilities.
__________________

Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE


*** New:
Please login or register to view this content. Registration is FREE
PaulW is offline
Reply With Quote
View Public Profile
 
Old 06-11-2011, 04:42 AM Re: Bayesian Theorm - Content Categorizing
evans123's Avatar
Ultra Talker

Posts: 468
Trades: 0
So after getting the content into tri chracter ngrams, i would use html_entity_decode(), to allow the characters to be saved into the database.
evans123 is offline
Reply With Quote
View Public Profile Visit evans123's homepage!
 
Reply     « Reply to Bayesian Theorm - Content Categorizing
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.82978 seconds with 12 queries