Naive Bayesian classification

Here is a worked example of naive Bayesian classification which is an application of Bayesian inference to the document classification[?] problem.

Consider the problem of classifying documents by their content, for example into spam and non-spam E-mails. Imagine that documents are drawn from a number of classes of documents which can be modelled as sets of words where the (independent) probability that the i-th word of a given document occurs in a document from class C can be written as

(For this treatment, we simplify things further by assuming that the probability of a word in a document is independent of the length of a document, or that all documents are of the same length).

Then the probability of a given document D, given a class C, is

The question that we desire to answer is: "what is the probability that a given document D belongs to a given class C?"

Now, by their definition, (see Probability axiom)

and

Bayes' theorem manipulates these into a statement of probability in terms of likelihood.

Assume for the moment that there are only two classes, S and ¬S.

and

Using the Bayesian result above, we can write:

Dividing one by the other gives:

Which can be re-factored as:

Thus, the probability ratio p(S | D) / p(¬S | D) can be expressed in terms of a series of likelihood ratios[?]. The actual probability p(S | D) can be easily computed from ln(p(S | D) / p(¬S | D)) based on the observation that p(S | D) + p(¬S | D) = 1.

Taking the logarithm of all these ratios, we have:

This technique of "log-likelihood ratios" is a common technique in statistics. In the case of two mutually exclusive alternatives (such as this example), the conversion of a log-likelihood ratio to a probability takes the form of a sigmoid curve: see logit for details.

In real life, the naive Bayes approach is more powerful than might be expected from the extreme simplicity of its model; in particular, it is fairly robust in the presence of non-independent attributes w_i. Recent theoretical analysis has shown why the naive Bayes classifier is so robust.