The identification of duplicated and plagiarisedudpassages of text has become an increasingly active area ofudresearch. In this paper we investigate methods for plagiarismuddetection that aim to identify potential sources of plagiarismudfrom MEDLINE, particularly when the original text has beenudmodified through the replacement of words or phrases. Audscalable approach based on Information Retrieval is used toudperform candidate document selection - the identification of audsubset of potential source documents given a suspicious textud- from MEDLINE. Query expansion is performed using theudULMS Metathesaurus to deal with situations in which originaluddocuments are obfuscated. Various approaches to Word SenseudDisambiguation are investigated to deal with cases where thereudare multiple Concept Unique Identifiers (CUIs) for a given term.udResults using the proposed IR-based approach outperform audstate-of-the-art baseline based on Kullback-Leibler Distance.
展开▼