Web Spamming By Detecting Gray Mail |
Weaddress the problem of gray mail – messages that could reasonably be consideredeither spam or good. Email users often disagree on this mail, presentingserious challenges to spam ﬁlters in both model training and evaluation. Inthis paper, we propose four simple methods for detecting gray mail and comparetheir performance using recall-precision curves. Among them, we found thatemail campaigns that have messages labeled diﬀerently are the most reliable source for learning a graymail detector. Preliminary experiments also show that even when the gray maildetector is imperfect, a traditional statistical spam ﬁlter can still beimproved consistently in diﬀerent regions of theROC curve by incorporating this new information.