Memory-based context-sensitive spelling correction at web scale
2007
https://doi.org/10.1109/ICMLA.2007.50Abstract
We study the problem of correcting spelling mistakes in text using memory-based learning techniques and a very large database of token n-gram occurrences in web text as training data. Our approach uses the context in which an error appears to select the most likely candidate from words which might have been intended in its place. Using a novel correction algorithm and a massive database of training data, we demonstrate higher accuracy on correcting realword errors than previous work, and very high accuracy at a new task of ranking corrections to non-word errors given by a standard spelling correction package.
References (8)
- K. Atkinson. GNU Aspell, 1998. Software available at http://aspell.net/.
- M. Banko and E. Brill. Scaling to very very large corpora for natural language disambiguation. In Meeting of the Associa- tion for Computational Linguistics, pages 26-33, 2001.
- T. Brants and A. Franz. Web 1t 5-gram version 1, 2006.
- A. J. Carlson, J. Rosen, and D. Roth. Scaling up context- sensitive text correction. In Proceedings of the Thirteenth Conference on Innovative Applications of Artificial Intelli- gence Conference, pages 45-50. AAAI Press, 2001.
- K. W. Church and W. A. Gale. Probability scoring for spelling correction. Statistics and Computing, 1991.
- A. R. Golding and D. Roth. Applying winnow to context- sensitive spelling correction. In International Conference on Machine Learning, pages 182-190, 1996.
- M. Lapata and F. Keller. Web-based models for natural lan- guage processing. ACM Trans. Speech Lang. Process., 2(1):3, 2005.
- V. Liu and J. R. Curran. Web text corpus for natural language processing. In EACL. The Association for Computer Linguis- tics, 2006.