首页> 外文会议>International KEYSTONE Conference >Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian
【24h】

Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian

机译:基于词汇资源的文本数据库的索引编制:以塞尔维亚为例

获取原文

摘要

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and named entity recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia for several decades now. Each document within this database is described by a summary report, consisting of metadata on the geological project, such as title, domain, keywords, abstract, and geographical location. A bag of words was produced from these metadata with the help of morphological dictionaries and transducers, while named entities were recognized using a rule-based system. Both were then used for pre-indexing documents for information retrieval purposes where ranking of retrieved documents was based on several t f_idf based measures. Evaluation of ranked retrieval results based on data obtained by pre-indexing were compared to results obtained by informational retrieval without pre-indexing with precision-recall curve, showing a significant improvement in terms of the mean average precision measure.
机译:在本文中,我们描述了一种通过使用词袋和命名实体识别对文档进行预索引来改善大型文本数据库的信息检索结果的方法。该方法已应用于由塞尔维亚共和国资助数十年的地质项目数据库。该数据库中的每个文档都由摘要报告描述,该摘要报告由地质项目中的元数据组成,例如标题,领域,关键字,摘要和地理位置。这些元数据借助形态学词典和转换器帮助生成了一个词袋,同时使用基于规则的系统识别了命名实体。然后将两者都用于为信息检索目的对文档进行预索引编制,其中,基于几种基于t f_idf的度量对检索到的文档进行排名。将基于通过预索引获得的数据的排名检索结果的评估与通过不使用精确召回曲线进行预索引的信息检索获得的结果进行了比较,显示出平均平均精确度度量方面的显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号