|
|
Resource Guide -> W3C Standards-in-progress, Voice Browsers, Semantics -> Stochastic Language Models (N-Gram) Specification
Stochastic Language Models (N-Gram) SpecificationWorking Draft The N-gram specification, one of the proposals for the W3C's Speech Interface Framework, defines the mechanism for combining N-Gram stochastic and structured grammars as well as methods for combining semantic definitions. It also describes markup for representing statistical language models. A "stochastic process" is the study in probability theory of a family of random variables indexed by some other set. A "Markov process" (named after the Russian mathematician) is a type of stochastic process concerned with the conditional probability of an arbitrary future event given the entire past of the process. The use of stochastic N-Gram models has a successful history in the research community, and is now being utilized in commercial systems. Since some structured grammars are also stochastic, confusion can be avoided by referring to N-Gram stochastic grammars as N-Gram grammars, or simply N-Grams. N-Gram grammars are typically constructed from statistics obtained from a large corpus of text using the co-occurrences of words to determine word sequence probabilities. This statistical method has the advantage of be able to cover a much larger language than would normally be derived. The primary purpose of specifying a stochastic grammar format is to support large vocabulary and open vocabulary applications. In addition, stochastic grammars can be used to represent concepts or semantics. |
|
|
|
|
|
|