Sign In/My Account | View Cart  
advertisement

Article:
 Working with Bayesian Categorizers
Subject: Structured Data
Date: 2003-11-21 01:50:56
From: dave scotson

I've been looking into this recently and I can't find a good intro to using Bayes with (semi-)structured data. Most examples just break large areas of text into tokens.


Now, for (a hypothetical) example, imagine you had the title, author, abstract, and journal name for a large amount of published articles and you wanted to apply a simple binary classification e.g. (not-)interesting to cardiology students.


Now the abstracts will of course hold a lot of key terms but surely the author, and journal name hold vital linking information that will be greatly diminished if you just dumped all the text into one big string.


I assume that spam filters already do this, saying *this* is text from the subject, *this* is text from the body, *this* is from header X but I haven't seen any introductory articles on how to go about this.


Any links would be greatly appreciated.


Previous Message Previous Message   Next Message Next Message


Full Text Titles Only Newest First

Sponsored By: