Im working on a script that uses the bayesian theorem and the inverse chi squared function to retrieve the probability that any given text belongs in a specific category.
I've read up on the inverse chi squared function and i know it shouldnt give a negative value, but for some reason im getting negative values from the function.
PHP Code:
$N = 0; $H = $S = 1; foreach($ngrams as $k => $v) { if ( !isset($knowledge[$k]) ) continue;
$N++; $value = $knowledge[$k] * $v; $H *= $value; $S *= (float)( 1 - ( ($value>=1) ? 0.99 : $value) ); }
So after calculating the above the values are as follows:
H: 4
S: 1.0E-20
N = 10
These values are now passed into the inverse chi squared function with 2 degress of freedom.
PHP Code:
$H = $this->chi2Q( -2 * log( $N * $H), 2 * $N); $S = (float)$this->chi2Q( -2 * log( $N * $S), 2 * $N);
Inverse Chi Squared Function:
PHP Code:
function chi2Q($x, $v) { $m = $x / 2; $sum = $term = exp(-$m);
for($i=1; $i < ($v/2);$i++) { $term *= $m/$i; $sum += $term; } echo 'Sum: '.$sum; return ($sum < 1.0) ? $sum : 1.0; }
The results from the function are as follows:
H: -2.8299816393601
S: 2.0228764578537E-10
Then going back to the original function we can calculate the probability that its in H or S.
PHP Code:
$I = (( 1 + ($H - $S)) / 2) * 100; return is_finite($I) ? $I : 100;
Because H is alot smaller that S it causes the final probability to be -91.499081978121, obviously you shouldnt get an inverse number from a chi square distribution, so im not sure where its gone wrong. Maybe i haven't got enough information, and if so how would i go about detecting this?
Last edited by evans123; 06-12-2011 at 09:36 AM..
|