Smoothing Technique for Statistical Language Model Based on Global Discount
Article
Figures
Metrics
Preview PDF
Reference
Related
Cited by
Materials
Abstract:
Smoothing techniques are mainly used to solve the problem of sparse data for statistical language model. The present smoothing techniques deal with the data sparse problem using different discount and compensate strategy, and they have different merit or shortcoming on complexity and rationality. This paper presents a new kind of smoothing technique based on global discount for Bi-gram model. The model parameters, probabilities for bigram, are discounted according to frequency of bigram, and are compensated according to lower-level model for unseen events in the model, whose rationality is indicated by minimizing the perplexity. Experiment results show that the technique is superior to commonly used Katz smoothing technique.