Kipyatkova I.S., Karpov A.A.
DEVELOPMENT AND RESEARCH OF A STATISTICAL RUSSIAN LANGUAGE MODEL
Full text: pdf
URL: http://www.proceedings.spiiras.nw.ru/data/src/2010/12/00/spyproc-2010-12-00-03-en.html
Reference: Kipyatkova I.S., Karpov A.A. Development and Research of a Statistical Russian Language Model // SPIIRAS Proceedings. 2010. Issue 12. Pp. 35-49.
UDC 004.522
Kipyatkova I.S., Karpov A.A.
Development and Research of a Statistical Russian Language Model.
Abstract. In the paper, the process of creation of a statistical Russian language model for continuous speech recognition systems is described. Characteristics of the collected corpus that consists of several news Internet sites of some on-line newspapers is given; a statistical analysis of this corpus is carried out. Unigram, bigram, and trigram Russian language models have been created on the base of the collected text corpus. For an estimation of quality of these models the entropy and perplexity parameters for these modelshave been computed. Also a survey of existing approaches for creation of statistical language models is given in the paper.
Keywords: statistical text processing, language model.
References
- Баглей С.Г., Антонов А.В., Мешков В.С., Суханов А.В. Статистические распределе- ния слов в русскоязычной текстовой коллекции // Материалы междунар. конф. «Диалог 2009». Москва. 2009. С. 13–18.
- Горностай Т., Васильев А., Скадиньш Р., Скадиня И. Опыт латышско-русского машинного перевода // Материалы междунар. конф. «Диалог 2007». Москва. 2007. С. 137–146.
- Кипяткова И.С., Карпов А.А. Автоматическая обработка и статистический анализ новостного текстового корпуса для модели языка системы распознавания русской речи // Информационно-управляющие системы. 2010. № 4(47). С. 2–8.
- Кипяткова И.С., Карпов А.А. Модуль фонематического транскрибирования для системы распознавания разговорной русской речи // Искусственный интеллект. 2008. № 4. С. 747–757.
- Протасов С.В. Вывод и оценка параметров дальнодействующей триграммной модели языка // Материалы междунар. конф. «Диалог 2008». Москва. 2008. C. 443–449.
- Холоденко А.Б. О построении статистических языковых моделей для систем рас- познавания русской речи // Интеллектуальные системы. 2002. Т. 6, вып. 1–4. С. 381–394.
- Clarkson P., Rosenfeld R. Statistical language modeling using the CMU-Cambridge toolkit // Proc. of EUROSPEECH. Rhodes. Greece. 1997. P. 2707–2710.
- Gelbukh A., Sidorov G. Zipf and Heaps Laws’ Coefficients Depend on Language // Proc. CICLing-2001, Conf. on Intelligent Text Processing and Computational Linguistics. Mexico City. Lecture Notes in Computer Science № 2004. 2001. Springer-Verlag. P. 332–335.
- Kurimo M., Hirsimaki T., Turunen V.T., Virpioja S. et al. Unsupervised decomposition of words for speech recognition and retrieval // Proc. of 13th Intern. Conf. SPECOM'2009. St. Petersburg. 2009. P. 23–28.
- Merkel A., Klakow D. Improved Methods for Language Model Based Question Classification // Proc. of 8th Interspeech Conf. Antwerpen. 2007. P. 322–325.
- Moore G.L. Adaptive Statistical Class-based Language Modelling. PhD thesis. Cambridge University. 2001. 193 p.
- Rabiner L., Juang B.-H. Fundamentals of Speech Recognition. Prentice Hall, 1995. 507 p.
- Vaiciunas A. Statistical Language Models of Lithuanian and Their Application to Very Large Vocabulary Speech Recognition. Summary of Doctoral Dissertation. Kaunas: Vytautas Magnus University, 2006. 35 p.
- Whittaker E.W.D. Statistical Language Modelling for Automatic Speech Recognition of Russian and English. PhD thesis. Cambridge University. 2000. 140 p.
Kipyatkova Irina Sergeevna — junior researcher, Laboratory of Speech and Multimodal Interfaces, Institution of The Russian Academy of Sciences St. Petersburg Institution for Informatics and Automation of RAS.
Research interests: automatic speech recognition statistical language models. The number of publications — 15.
| E-mail: |
kipyatkova@iias.spb.su |
| Address: |
14th Line V.O., 39, St.Petersburg, 199178, Russia |
| Office phone: |
+7(812)328-7081 |
| Fax: |
+7(812)328-7081 |
Scientific advisor — A.A. Karpov.
Karpov Alexey Anatolyevich — PhD; senior researcher, Laboratory of Speech and Multimodal Interfaces, Institution of The Russian Academy of Sciences St. Petersburg Institution for Informatics and Automation of RAS.
Research interests: automatic speech recognition, multimodal interfaces, audio-visual speech recognition. The number of publications — 100.
| E-mail: |
karpov@iias.spb.su |
| Address: |
14th Line V.O., 39, St.Petersburg, 199178, Russia |
| Office phone: |
+7(812)328-7081 |
| Fax: |
+7(812)328-7081 |
Full text: pdf
URL: http://www.proceedings.spiiras.nw.ru/data/src/2010/12/00/spyproc-2010-12-00-03-en.html
Reference: Kipyatkova I.S., Karpov A.A. Development and Research of a Statistical Russian Language Model // SPIIRAS Proceedings. 2010. Issue 12. Pp. 35-49.