首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >An IR-Based Approach Utilizing Query Expansion for Plagiarism Detection in MEDLINE
【24h】

An IR-Based Approach Utilizing Query Expansion for Plagiarism Detection in MEDLINE

机译:基于查询检索的基于IR的MEDLINE抄袭检测方法

获取原文
获取原文并翻译 | 示例

摘要

The identification of duplicated and plagiarized passages of text has become an increasingly active area of research. In this paper, we investigate methods for plagiarism detection that aim to identify potential sources of plagiarism from MEDLINE, particularly when the original text has been modified through the replacement of words or phrases. A scalable approach based on Information Retrieval is used to perform candidate document selection—the identification of a subset of potential source documents given a suspicious text—from MEDLINE. Query expansion is performed using the ULMS Metathesaurus to deal with situations in which original documents are obfuscated. Various approaches to Word Sense Disambiguation are investigated to deal with cases where there are multiple Concept Unique Identifiers (CUIs) for a given term. Results using the proposed IR-based approach outperform a state-of-the-art baseline based on Kullback-Leibler Distance.
机译:识别重复和抄袭的文本段落已成为研究中越来越活跃的领域。在本文中,我们研究了窃检测方法,旨在从MEDLINE中识别potential窃的潜在来源,尤其是当通过替换单词或短语对原始文本进行修改时。基于信息检索的可伸缩方法用于从MEDLINE执行候选文档选择(即在给定可疑文本的情况下识别潜在源文档的子集)。使用ULMS Metathesaurus执行查询扩展以处理原始文档被混淆的情况。研究了多种解决词义歧义的方法,以处理给定术语存在多个概念唯一标识符(CUI)的情况。使用建议的基于IR的方法的结果优于基于Kullback-Leibler距离的最新基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号