Exploring Alternative Clustering for PIY Source Code Detection

机译：探索PIY源代码检测的替代聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we compare different clustering algorithms given a specific type of data set. Clustering is a powerful tool because it distributes data into meaningful groups based on the information found in data sets. Given a set of data points, each defined by a set of attributes, we find clusters such that points in one cluster are more similar to one another and less similar to points in other clusters. These groups of clusters are crucial to how data is analyzed. It helps us easily identify and give meaning to certain data according to their traits. Clustering helps handle a data set with more utility thus the study of techniques for finding the most representative cluster model is vital in knowledge extraction. Previous work done by Anthony Ohmann and Professor Imad Rahal propose a scalable system called PIY (Program It Yourself) that can detect source code plagiarism over a large repository of submissions where new submissions are compared to current ones. By using clusters, one can compare a new submission to a subset of the data. Accuracy and time are both important factors for PIY. Therefore, we base efficiency of clustering on accuracy and time. In this paper, we perform an analysis of K-Harmonic Means (KHM) against one of PIY's current clustering algorithms called K-Medoid. Developed by Dr. Bin Zhang, the KHM algorithm is derived from the K-Means and Harmonic Average algorithm. It is known to be more "robust" than the K-Means algorithm. Our goal is to find which algorithm gives us the most favorable results.

机译：在本文中，我们比较给定数据集的特定类型的不同聚类算法。集群是一个强大的工具，因为它的数据分配到基于信息有意义组数据集合中。给定一组数据点，每个由一组属性的定义，我们发现簇，使得在一个簇中的点更类似于彼此少类似于其他簇分。这些组群的是如何分析数据的关键。它可以帮助我们轻松识别，并根据他们的特点赋予意义对某些数据。聚类有助于处理更多的效用因而技术寻找最有代表性的集群模式研究的数据集在知识提取至关重要。安东尼Ohmann和伊马德拉哈尔教授做以前的工作提出了一个名为PIY（计划动手）可扩展的系统，它可以在一个大的库提交新的地方提交的相比，目前一检测源代码抄袭。通过使用集群，一个可以在新的提交比较数据的一个子集。精度和时间对于PIY的重要因素。因此，我们立足于聚类精度和时间效率。在本文中，我们对所谓的K-Medoid的PIY目前的聚类算法进行一个K-调和均值（KHM）的分析。由张斌博士开发的，KHM算法从K均值和调和平均数的算法得出。它被称为是比K-means算法更“稳健”。我们的目标是寻找一种算法为我们提供了最有利的结果。

著录项

来源
《Annual midwest instruction and computing symposium》|2014年||共14页
会议地点
作者
Pa Woua Vang; James Schnepf;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
入库时间 2022-08-20 20:04:01

相似文献

外文文献
中文文献
专利

1. Efficient clustering-based source code plagiarism detection using PIY [J] . Ohmann Tony, Rahal Imad Knowledge and information systems . 2015,第2期

机译：使用PIY的基于聚类的高效源代码窃检测
2. Pde4java: Plagiarism Detection Engine For Java Source Code: A Clustering Approach [J] . Ameera Jadalla, Ashraf Elnagar International Journal of Business Intelligence and Data Mining . 2008,第2期

机译：Pde4java：Java抄袭检测引擎源代码：聚类方法
3. Errata to “Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array Clusters” [J] . Sungjoon Koh, Jie Zhang, Miryeong Kwon, Parallel and Distributed Systems, IEEE Transactions on . 2020,第6期

机译：“探索可缩放的全闪光阵列集群”探索容错擦除代码的勘误表“
4. Exploring Alternative Clustering for PIY Source Code Detection [C] . Pa Woua Vang, James Schnepf Annual midwest instruction and computing symposium . 2014

机译：探索用于PIY源代码检测的替代聚类
5. Mobile, hybrid Compton/coded aperture imaging for detection, identification and localization of gamma-ray sources at stand-off distances. [D] . Tornga, Shawn R. 2013

机译：移动，混合康普顿/编码孔径成像技术，用于在相距距离处检测，识别和定位伽马射线源。
6. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources [O] . Dinanath Sulakhe, Mark D’Souza, Sheng Wang, -1

机译：使用可用的注释来源探索替代剪接对人蛋白质同工型的功能影响
7. Currents: Across the human plasma proteome C. elegansinteractome Clusters of interactions MS images and histology Protein splicing found in human cells SELDI/MS or CE/MS? Microarrays for monitoring alternative splicing Efficient protein biotinylation Element-coded affinity tags Identifying peptides using MS/MS ion intensities Proteins detected by their natural fluorescence [O] . 2004

机译：电流：穿过人血浆蛋白质组c。 elegansinterame 互动集群 MS图像和组织学人体细胞中发现蛋白质剪接 seldi / ms或ce / ms？用于监测替代剪接的微阵列高效蛋白生物素化元素编码的亲和标签使用MS / MS离子强度识别肽由其天然荧光检测的蛋白质

Exploring Alternative Clustering for PIY Source Code Detection

摘要

著录项

相似文献

相关主题

期刊订阅