首页> 外文会议>International KEYSTONE Conference >Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian
【24h】

Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian

机译:基于词汇资源的文本数据库索引 - 塞尔维亚语案例研究

获取原文

摘要

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and named entity recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia for several decades now. Each document within this database is described by a summary report, consisting of metadata on the geological project, such as title, domain, keywords, abstract, and geographical location. A bag of words was produced from these metadata with the help of morphological dictionaries and transducers, while named entities were recognized using a rule-based system. Both were then used for pre-indexing documents for information retrieval purposes where ranking of retrieved documents was based on several tf_idf based measures. Evaluation of ranked retrieval results based on data obtained by pre-indexing were compared to results obtained by informational retrieval without pre-indexing with precision-recall curve, showing a significant improvement in terms of the mean average precision measure.
机译:在本文中,我们描述了一种通过使用单词袋和命名实体识别来提高大型文本数据库的信息检索结果的方法。该方法是在塞尔维亚共和国的几十年来上申请了塞尔维亚共和国资助的地质项目数据库。该数据库中的每个文档由摘要报告描述,由地质项目上的元数据组成,例如标题,域,关键字,摘要和地理位置。在形态词典和传感器的帮助下,通过这些元数据产生一袋单词,而使用基于规则的系统识别命名实体。然后,两者都用于预先索引文档,用于信息检索目的,其中检索到的文件排序是基于基于几种基于TF_IDF的措施。将基于通过预先索引获得的数据的排名检索结果的评估与通过信息检索而获得的结果进行了比较,而无需预先索引精密召回曲线,显示出在平均平均精度测量方面的显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号