One of the famous Spam prevention mechanism is Bayesian Filtering.
This Bayesian filtering can filter almost any kind of data but widely applied in email world to classify a spam mail.
The algorithm operates on the classic bayes theorem.
Probability that A occurs given B has occured = Probability that B occurs given A occured X probability that B occurs divided by Probability that A occurs
P(A/B) = P(B/A)x P(B) / P(A)
In email context, it is classifying that a mail is a spam based on the words in it.
Probability that these words occur in a spam mail X probability that a given mail is a spam normalised by the probability that these words can occur in a mail.
Probability that a given mail is a spam is a user specific factor : If the user rejects lot of mail as spam, he may be so picky or he really gets a lot of spam. Then this will be high for this user.
But if he gets very less probable spam mails; but what ever is put as spam contains these words; then it does not mean that this is a spam mail. Because these can be in a mail that is not a spam. So the probability of these words in any mail (denominator) checks this factor that commonly used words in all mails will not be given high weightage in rejecting a mail as spam.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment