文件名称:Machine Learning The Art and Science of Algorithms thatMake Sense of Data
文件大小:1.34MB
文件格式:PDF
更新时间:2017-08-26 07:05:34
Machine Learning
There are a number of useful ways in which we can express the SpamAssassin classifier in mathematical notation. If we denote the result of the i -th test for a given e-mail as xi , where xi = 1 if the test succeeds and 0 otherwise, and we denote the weight of the i -th test as wi , then the total score of an e-mail can be expressed as Pni =1wi xi , making use of the fact that wi contributes to the sum only if xi = 1, i.e., if the test succeeds for the e-mail. Using t for the threshold above which an e-mail is classified as spam (5 in our example), the ‘decision rule’ can be written as Pni =1wi xi > t . Notice that the left-hand side of this inequality is linear in the xi variables, which essentially means that increasing one of the xi by a certain amount, say ±, will change the sum by an amount (wi±) that is independent of the value of xi . This wouldn’t be true if xi appeared squared in the sum, or with any exponent other than 1.