A Study of Indexing Units for Japanese Spoken Document Retrieval

机译：日语语音文档检索的索引单位研究

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Spoken document retrieval (SDR) from Japanese lectures is addressed. In Japan, recently, lecture retrieval test collection (ad-hoc SDR task), which consists of 2,702 audio lectures of the Corpus of Spontaneous Japanese and 39 retrieval queries, has been designed. For an ad-hoc task, appropriate indexing is significant. Automatic speech recognition (ASR) is performed to make index terms, which essentially contain ASR errors. Therefore, studies of indexing terms that are robust to ASR errors are necessary. In Japanese text, no space is put between words, and word units are ambiguous. Thus, studies of indexing units are also important. Based on this background, indexing units are investigated in Japanese SDR. As for indexing units, morphemes, character N-grams, and combinations of the two are investigated. Morpheme unit indexing cannot deal with misrecognition of parts of words. Therefore, indexing units based on character N-grams are investigated. Although SDR has improved for some queries, we do not achieve an overall improvement. Combination with morpheme units did not work well. We confirmed the significance of the introduction of stop-word criteria in character N-gram-based indexing.

机译：解决了日语讲座中的语音文档检索（SDR）。在日本，最近，设计了演讲检索测试资料集（临时SDR任务），其中包括自发日语语料库的2 702场音频演讲和39个检索查询。对于临时任务，适当的索引编制非常重要。执行自动语音识别（ASR）以创建索引词，该词项实质上包含ASR错误。因此，有必要研究对ASR错误具有鲁棒性的索引项。在日语文本中，单词之间没有空格，并且单词单位不明确。因此，索引单元的研究也很重要。基于此背景，在日本SDR中研究了索引单位。至于索引单位，研究了词素，字符N-gram和二者的组合。词素单位索引不能处理单词部分的误识别。因此，研究了基于字符N元语法的索引单元。尽管SDR在某些查询方面有所改进，但我们并未实现整体改进。与语素单位结合使用效果不佳。我们确认了在基于字符N-gram的索引中引入停用词标准的重要性。

著录项

来源
《10th Western Pacific Acoustics Conference.》|2009年|p.1-8|共8页
会议地点 Beijing(CN);Beijing(CN)
作者
Koji SHIGEYASU; Hiroaki NANJO; Takehiko YOSHIMI;
展开▼
作者单位

Graduate School of Science and Technology,Ryukoku University,Japan;

Graduate School of Science and Technology,Ryukoku University,Japan;

Graduate School of Science and Technology,Ryukoku University,Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类声学;声学;
关键词

相似文献

外文文献
中文文献
专利

1. A novel approach to perform context-based automatic spoken document retrieval of political speeches based on wavelet tree indexing [J] . Gupta Anishka, Yadav Divakar Multimedia Tools and Applications . 2021,第14期

机译：基于小波树索引的基于语境的自动口语文献检索的新方法
2. SpeechFind: Advances in Spoken Document Retrieval for a National Gallery of the Spoken Word [J] . Hansen J.H.L., Huang R., Zhou B., IEEE Transactions on Speech and Audio Proceessing . 2005,第5期

机译：SpeechFind：国家语言单词库的语音文档检索进展
3. Comprehensive Study on Information Retrieval: Arabic Document Indexing [J] . Ismail Hmeidi, Hisham A. Shehadeh, Abdalrhman A. Almodawar, Research journal of science and technolo . 2014,第2期

机译：信息检索的综合研究：阿拉伯文档索引
4. A Study of Indexing Units for Japanese Spoken Document Retrieval [C] . Koji SHIGEYASU, Hiroaki NANJO, Takehiko YOSHIMI Western Pacific Acoustics Conference . 2009

机译：日本口语文件检索索引单位研究
5. Automatic Indexing of Japanese Documents and its Application to Information Retrieval [D] . 木本, 晴夫 1993

机译：日语文件自动索引及其在信息检索中的应用
6. A New Method for User-centered Indexing of Documents in Information Retrieval [O] . Yuri Kagolovsky, Andre Kushniruk, Stefan Pantazi, 2002

机译：信息检索中以用户为中心的文档索引新方法
7. Comparison of Word and Subword Indexing Techniques for Mandarin Chinese Spoken Document Retrieval [O] . Hsin-min Wang, Berlin Chen 2008

机译：普通话语音文档检索中词和子词索引技术的比较

A Study of Indexing Units for Japanese Spoken Document Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅