Author
Teh, Y
Journal title
COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
DOI
10.3115/1220175.1220299
Volume
1
Last updated
2021-10-19T13:20:15.863+01:00
Page
985-992
Abstract
We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney. © 2006 Association for Computational Linguistics.
Symplectic ID
353269
Publication type
Journal Article
ISBN-13
9781932432657
ISBN-10
1932432655
Publication date
1 January 2006
Please contact us with feedback and comments about this page. Created on 17 Jan 2017 - 17:30.