Using word semantic concepts for plagiarism detection in text documents

Chang Chia-Yang; Lee Shie-Jue; Wu Chih-Hung; Liu Chih-Feng; Liu Ching-Kuan

首页> 外文期刊>Information retrieval >Using word semantic concepts for plagiarism detection in text documents

【24h】

Using word semantic concepts for plagiarism detection in text documents

机译：在文本文件中使用Word语义概念进行抄袭检测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Plagiarism is a common problem in the modern age. With the advance of Internet, it is more and more convenient to access other people's writings or publications. When someone uses the content of a text in an undesirable way, plagiarism may occur. Plagiarism infringes the intellectual property rights, so it is a serious problem nowadays. However, detecting plagiarism effectively is a challenging work. Traditional methods, like vector space model or bag-of-words, are short of providing a good solution due to the incapability of handling the semantics of words satisfactorily. In this paper, we propose a new method for plagiarism detection. We use Word2vec to transform the words into word vectors which are able to reveal the semantic relationship among different words. Through word vectors, words are clustered into concepts. Then documents and their paragraphs are represented in terms of concepts, and plagiarism detection can be done more effectively. A number of experiments are conducted to demonstrate the good performance of our proposed method.

机译：剽窃是现代时代的常见问题。随着互联网的进步，访问其他人的着作或出版物越来越方便。当有人以不受欢迎的方式使用文本的内容时，可能会发生抄袭。剽窃侵犯了知识产权，所以现在是一个严重的问题。然而，有效地检测抄袭是一个具有挑战性的工作。传统方法，如矢量空间模型或袋式，由于令人满意地处理词语的语义而无法实现良好的解决方案。在本文中，我们提出了一种新方法，用于抄袭检测方法。我们使用Word2VEC将单词转换为单词向量，能够揭示不同单词之间的语义关系。通过字向量，单词被聚集到概念中。然后在概念方面代表文件及其段落，可以更有效地完成抄袭检测。进行了许多实验以证明我们提出的方法的良好表现。

著录项

来源
《Information retrieval》 |2021年第5期|298-321|共24页
作者
Chang Chia-Yang; Lee Shie-Jue; Wu Chih-Hung; Liu Chih-Feng; Liu Ching-Kuan;
展开▼
作者单位

Natl Sun Yat Sen Univ Dept Elect Engn Kaohsiung Taiwan;

Natl Sun Yat Sen Univ Dept Elect Engn Kaohsiung Taiwan|Natl Sun Yat Sen Univ Intelligent Elect Commerce Res Ctr Kaohsiung Taiwan;

Natl Univ Kaohsiung Dept Elect Engn Kaohsiung Taiwan;

Cheng Shiu Univ Incubat Ctr Kaohsiung Taiwan;

Kaohsiung Med Univ Dept Neurol Coll Med Kaohsiung Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Plagiarism; Vector space model; Bag-of-words; Word2vec; Word vector; Clustering;

机译：抄袭;矢量空间模型;袋子;Word2Vec;Word Vector;聚类;

相似文献

外文文献
中文文献
专利

1. Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style [J] . Gabriel Oberreuter, Juan D. Velasquez Expert Systems with Application . 2013,第9期

机译：文本挖掘应用于窃检测：使用单词来检测写作风格中的偏差
2. Document Plagiarism Detection Using a New Concept Similarity in Formal Concept Analysis [J] . Jirapond Muangprathub, Siriwan Kajornkasirat, Apirat Wanichsombat Journal of applied mathematics . 2021,第a期

机译：在正式概念分析中使用新概念相似性的文献抄袭检测
3. Semantic Annotating of Text Documents: Basic Concepts and Taxonomic Approach [J] . M. P. Kogalovskii Automatic Documentation and Mathematical Linguistics . 2018,第3期

机译：文本文档的语义注释：基本概念和分类方法
4. Algorithm of the longest commonly consecutive word for Plagiarism detection in text based document [C] . Sediyono Agung, Mahamud Ku Ruhana Ku- International Conference on Digital Information Management . 2008

机译：基于文本文档中抄袭检测的最长常见连续词的算法
5. ANSWER: A Cognitively-Inspired System for the Unsupervised Detection of Semantically Salient Words in Texts [D] . Candadai Vasu, Madhavun 2015

机译：答案：认知启发性系统，用于文本中语义上显着的单词的无监督检测
6. The Semantic Content of Abstract Concepts: A Property Listing Study of 296 Abstract Words [O] . Marcel Harpaintner, Natalie M. Trumpp, Markus Kiefer -1

机译：抽象概念的语义内容：296个抽象词的属性列表研究
7. Figure 1: (A) Example of a text-based forma mentis network. A TFMN can be represented either as an edge-coloured graph or as a multiplex network. Positive (negative) words are highlighted in cyan (red). Neutral words are in black. Syntactic links between positive (negative) words are highlighted in cyan (red) too. Syntactic links between positive and negative concepts are in purple. All semantic links of meaning overlap are highlighted in green. (B) Infographics about how a TFMN is assembled. Individuals organise their knowledge and emotional perception of the real world in their mental lexicon (comic clouds). [O] . -1

机译：图1：（a）基于文本的Forma Mentis网络示例。 TFMN可以用作边缘彩色图形或作为多路复用网络表示。在青色（红色）突出显示正（负）单词。中立词是黑色的。在青色（红色）突出显示正（否定）单词之间的句法链接。正面和消极概念之间的句法链接在紫色。含义重叠的所有语义链接都以绿色突出显示。（b）关于TFMN如何组装的信息图表。个人在他们的精神词典（漫画云）中对现实世界组织了他们的知识和情感感知。

Using word semantic concepts for plagiarism detection in text documents

摘要

著录项

相似文献

相关主题

期刊订阅