首页> 外文期刊>Neural Networks, IEEE Transactions on >Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection
【24h】

Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

机译:具有树状结构数据的多层SOM,可进行有效的文档检索和抄袭检测

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a new document retrieval (DR) and plagiarism detection (PD) system using multilayer self-organizing map (MLSOM). A document is modeled by a rich tree-structured representation, and a SOM-based system is used as a computationally effective solution. Instead of relying on keywords/lines, the proposed scheme compares a full document as a query for performing retrieval and PD. The tree-structured representation hierarchically includes document features as document, pages, and paragraphs. Thus, it can reflect underlying context that is difficult to acquire from the currently used word-frequency information. We show that the tree-structured data is effective for DR and PD. To handle tree-structured representation in an efficient way, we use an MLSOM algorithm, which was previously developed by the authors for the application of image retrieval. In this study, it serves as an effective clustering algorithm. Using the MLSOM, local matching techniques are developed for comparing text documents. Two novel MLSOM-based PD methods are proposed. Detailed simulations are conducted and the experimental results corroborate that the proposed approach is computationally efficient and accurate for DR and PD.
机译:本文提出了一种使用多层自组织图(MLSOM)的新型文档检索(DR)和窃检测(PD)系统。通过丰富的树状结构表示对文档进行建模,并将基于SOM的系统用作计算有效的解决方案。提出的方案不是依赖关键字/行,而是将完整文档作为执行比较和PD的查询进行比较。树结构表示法分层地包括文档特征,如文档,页面和段落。因此,它可以反映难以从当前使用的词频信息中获取的底层上下文。我们表明,树状结构的数据对于DR和PD有效。为了有效地处理树状结构表示,我们使用了MLSOM算法,该算法是作者先前为图像检索应用开发的。在这项研究中,它是一种有效的聚类算法。使用MLSOM,开发了本地匹配技术来比较文本文档。提出了两种基于MLSOM的新颖PD方法。进行了详细的仿真,实验结果证实了所提出的方法对于DR和PD具有计算效率和准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号