|
The spam/not spam/maybe spam classification is fairly simplistic although Bayesian filtering software can be very complex. Based on the nature of email communications, which tend to be short, keyword association can be effective especially since only spammers/marketers tend to use certain words whereas normal people don't. Such as "unsubscribe" and "viagra."
I only have a cursory background in natural language processing and fuzzy logic. However, I believe both disciplines are required in order to fully appreciate "interesting"-ness which of course varies from person to person. Rich subject matter-specific NLP is quickly becoming a reality in products like machine translators. The next frontier is the over-hyped intelligent agent, which Google and Microsoft are no doubt spending a substantial amount of money to develop in various forms.
Given the advances in information technology, with things getting smaller and faster, and with data storage doubling every year, it's not too far fetched to predict that, by the time we die of old age, silicon will be at least as smart as children.
One of my algorithms is "If author = Jon Udell then read it." That one cost me nothing.
|