|
Earlier this summer, I tried hooking SpamBayes up to my homebrew aggregator:
http://www.decafbad.com/blog/geek/bayes_agg_one.html
I wasn't nearly as thorough in my investigations as you, and I categorized things a lot differently. I'm not as concerned about topic categories as I am about filtering for that which is interesting versus that which is less so.
Basically, I had a set of records of what items I'd viewed or visited in my aggregator for about 5 months. I submitted the text content of all these items to SpamBayes as 'spam', hoping that the inversion of terms would still work and that my 'spam' from this system would actually be what I want to read.
After using it for a few months, the conclusion to which I've jumped is that what works for spam/non-spam categorization doesn't quite work for interesting/not-interesting categorization.
The difference between spam/non-spam seems pretty clear, relatively speaking. But, interesting/non-interesting has a lot more gradients in between, and seems more appropriate to a rating prediction method.
Anyway, figured I'd share another attempt at applying Bayes to RSS aggregation.
|