首页> 外文会议>International KEYSTONE Conference >Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian

【24h】

Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian

机译：基于词汇资源的文本数据库的索引编制：以塞尔维亚为例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and named entity recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia for several decades now. Each document within this database is described by a summary report, consisting of metadata on the geological project, such as title, domain, keywords, abstract, and geographical location. A bag of words was produced from these metadata with the help of morphological dictionaries and transducers, while named entities were recognized using a rule-based system. Both were then used for pre-indexing documents for information retrieval purposes where ranking of retrieved documents was based on several t f_idf based measures. Evaluation of ranked retrieval results based on data obtained by pre-indexing were compared to results obtained by informational retrieval without pre-indexing with precision-recall curve, showing a significant improvement in terms of the mean average precision measure.

机译：在本文中，我们描述了一种通过使用词袋和命名实体识别对文档进行预索引来改善大型文本数据库的信息检索结果的方法。该方法已应用于由塞尔维亚共和国资助数十年的地质项目数据库。该数据库中的每个文档都由摘要报告描述，该摘要报告由地质项目中的元数据组成，例如标题，领域，关键字，摘要和地理位置。这些元数据借助形态学词典和转换器帮助生成了一个词袋，同时使用基于规则的系统识别了命名实体。然后将两者都用于为信息检索目的对文档进行预索引编制，其中，基于几种基于t f_idf的度量对检索到的文档进行排名。将基于通过预索引获得的数据的排名检索结果的评估与通过不使用精确召回曲线进行预索引的信息检索获得的结果进行了比较，显示出平均平均精确度度量方面的显着改进。

著录项

来源
《International KEYSTONE Conference》|2015年|167-181|共15页
会议地点
作者
Ranka Stankovic; Cvetana Krstev; Ivan Obradovic; Olivera Kitanovic;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Tech Word: Development of a technology lexical database for structuring textual technology information based on natural language processing [J] . Hyejin Jang, Yujin Jeong, Byungun Yoon Expert systems with applications . 2021,第Feba期

机译：技术单词：基于自然语言处理构建文本技术信息的技术词汇数据库的开发
2. Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources [J] . Andreas Vlachidis, Ceri Binding, Douglas Tudhope, Aslib Proceedings . 2010,第4a5期

机译：挖掘灰色文献：通过自然语言处理技术和基于知识的资源丰富考古文献索引的案例研究
3. Treatment patterns, healthcare resource utilization, and costs among patients with idiopathic pulmonary fibrosis treated with antifibrotic medications in US-based commercial and Medicare Supplemental claims databases: a retrospective cohort study [J] . Mitra Corral, Kathryn DeYoung, Amanda M. Kong BMC Pulmonary Medicine . 2020,第1期

机译：在美国商业和医疗保险补充索赔数据库中抗纤维化药物治疗特发性肺纤维化患者治疗模式，医疗资源利用率和成本：回顾性队列研究
4. Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian [C] . Ranka Stankovic, Cvetana Krstev, Ivan Obradovic, International KEYSTONE Conference . 2015

机译：基于词汇资源的文本数据库索引 - 塞尔维亚语案例研究
5. Spoken Corpus-based Resources for Undergraduate Initial Interpreter Training and Lexical Knowledge Acquisition: Empirical Case Studies. [D] . Bale, Richard. 2013

机译：基于口语库的资源，用于本科生初始口译员培训和词汇知识获取：经验案例研究。
6. Treatment patterns healthcare resource utilization and costs among patients with idiopathic pulmonary fibrosis treated with antifibrotic medications in US-based commercial and Medicare Supplemental claims databases: a retrospective cohort study [O] . Mitra Corral, Kathryn DeYoung, Amanda M. Kong 2020

机译：在美国商业和医疗保险补充索赔数据库中抗纤维化药物治疗特发性肺纤维化患者治疗模式医疗资源利用率和成本：回顾性队列研究
7. Freqüencia de publicações sobre a asma em periódicos de enfermagem indexados em bases de dados brasileiros: uma revisão da literatura Frecuencia de publicaciones sobre asma en periódicos de enfermería indexados en las bases de datos brasileñas: una revisión de la literatura Frequency of publication of asthma studies in nursing journals indexed in brazilian databases: a literature review [O] . Luisa Helena de Oliveira Lima, Emanuel Moura Gomes, Violante Augusta Batista Braga 2006

机译：Freqüenciadepublicaçõessobrea asmaemperiódicosdeenfermagem indexados em bases de dados brasileiros：umarevisãodaliteratura Frecuencia de publicaciones sobre asmaenperiódicosdeenfermeríaindexadosen las bases dedatosbrasileñas：unarevisióndela literatura哮喘研究的出版频率在巴西数据库中索引的护理期刊：文献综述

Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian

摘要

著录项

相似文献

相关主题

期刊订阅