首页> 外文会议>International KEYSTONE Conference >Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian

【24h】

Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian

机译：基于词汇资源的文本数据库索引 - 塞尔维亚语案例研究

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and named entity recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia for several decades now. Each document within this database is described by a summary report, consisting of metadata on the geological project, such as title, domain, keywords, abstract, and geographical location. A bag of words was produced from these metadata with the help of morphological dictionaries and transducers, while named entities were recognized using a rule-based system. Both were then used for pre-indexing documents for information retrieval purposes where ranking of retrieved documents was based on several tf_idf based measures. Evaluation of ranked retrieval results based on data obtained by pre-indexing were compared to results obtained by informational retrieval without pre-indexing with precision-recall curve, showing a significant improvement in terms of the mean average precision measure.

机译：在本文中，我们描述了一种通过使用单词袋和命名实体识别来提高大型文本数据库的信息检索结果的方法。该方法是在塞尔维亚共和国的几十年来上申请了塞尔维亚共和国资助的地质项目数据库。该数据库中的每个文档由摘要报告描述，由地质项目上的元数据组成，例如标题，域，关键字，摘要和地理位置。在形态词典和传感器的帮助下，通过这些元数据产生一袋单词，而使用基于规则的系统识别命名实体。然后，两者都用于预先索引文档，用于信息检索目的，其中检索到的文件排序是基于基于几种基于TF_IDF的措施。将基于通过预先索引获得的数据的排名检索结果的评估与通过信息检索而获得的结果进行了比较，而无需预先索引精密召回曲线，显示出在平均平均精度测量方面的显着改善。

著录项

来源
《International KEYSTONE Conference》|2015年||共15页
会议地点
作者
Ranka Stankovic; Cvetana Krstev; Ivan Obradovic; Olivera Kitanovic;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. Tech Word: Development of a technology lexical database for structuring textual technology information based on natural language processing [J] . Hyejin Jang, Yujin Jeong, Byungun Yoon Expert systems with applications . 2021,第Feba期

机译：技术单词：基于自然语言处理构建文本技术信息的技术词汇数据库的开发
2. Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources [J] . Andreas Vlachidis, Ceri Binding, Douglas Tudhope, Aslib Proceedings . 2010,第4a5期

机译：挖掘灰色文献：通过自然语言处理技术和基于知识的资源丰富考古文献索引的案例研究
3. Treatment patterns, healthcare resource utilization, and costs among patients with idiopathic pulmonary fibrosis treated with antifibrotic medications in US-based commercial and Medicare Supplemental claims databases: a retrospective cohort study [J] . Mitra Corral, Kathryn DeYoung, Amanda M. Kong BMC Pulmonary Medicine . 2020,第1期

机译：在美国商业和医疗保险补充索赔数据库中抗纤维化药物治疗特发性肺纤维化患者治疗模式，医疗资源利用率和成本：回顾性队列研究
4. Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian [C] . Ranka Stankovic, Cvetana Krstev, Ivan Obradovic, International KEYSTONE Conference . 2015

机译：基于词汇资源的文本数据库的索引编制：以塞尔维亚为例
5. Spoken Corpus-based Resources for Undergraduate Initial Interpreter Training and Lexical Knowledge Acquisition: Empirical Case Studies. [D] . Bale, Richard. 2013

机译：基于口语库的资源，用于本科生初始口译员培训和词汇知识获取：经验案例研究。
6. Treatment patterns healthcare resource utilization and costs among patients with idiopathic pulmonary fibrosis treated with antifibrotic medications in US-based commercial and Medicare Supplemental claims databases: a retrospective cohort study [O] . Mitra Corral, Kathryn DeYoung, Amanda M. Kong 2020

机译：在美国商业和医疗保险补充索赔数据库中抗纤维化药物治疗特发性肺纤维化患者治疗模式医疗资源利用率和成本：回顾性队列研究
7. Freqüencia de publicações sobre a asma em periódicos de enfermagem indexados em bases de dados brasileiros: uma revisão da literatura Frecuencia de publicaciones sobre asma en periódicos de enfermería indexados en las bases de datos brasileñas: una revisión de la literatura Frequency of publication of asthma studies in nursing journals indexed in brazilian databases: a literature review [O] . Luisa Helena de Oliveira Lima, Emanuel Moura Gomes, Violante Augusta Batista Braga 2006

机译：Freqüenciadepublicaçõessobrea asmaemperiódicosdeenfermagem indexados em bases de dados brasileiros：umarevisãodaliteratura Frecuencia de publicaciones sobre asmaenperiódicosdeenfermeríaindexadosen las bases dedatosbrasileñas：unarevisióndela literatura哮喘研究的出版频率在巴西数据库中索引的护理期刊：文献综述

Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅