Extracting Irregular Datasets in University Admission Statistics using Text Mining and Benford's Law

机译：使用文本挖掘和本福德定律提取大学录取统计数据中的不规则数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

It is known as Benford's law that the distribution of the first digits forms a specific shape for natural numerical datasets. Deviation from the Benford's distribution indicates the irregularity of the dataset. However, it does not tell any clue to interpret the reason of irregularity. The present paper constructs a search engine of cells that appear in tables by correlating a cell with the words in the title of row or column or in the explanation of the table. We generate an exhaustive dataset of cells for testing irregularity by enumerating the search conditions. We applied the method to the number of applicants, the number of candidates, and the number of successful applicants in each department of 565 private universities in Japan. We confirmed the effectiveness of the proposed method by extracting the characteristics of the irregular datasets.

机译：众所周知的本福德定律是，第一位数字的分布形成了自然数值数据集的特定形状。与Benford分布的偏差表示数据集的不规则性。但是，它并没有提供任何线索来解释违规的原因。本文通过将单元格与行标题或列标题中或表说明中的单词相关联，来构造表中出现的单元格的搜索引擎。我们通过枚举搜索条件生成了一个详尽的细胞数据集，用于测试不规则性。我们将该方法应用于日本565所私立大学的每个系中的申请者数量，候选人数量和成功申请者数量。我们通过提取不规则数据集的特征证实了该方法的有效性。

著录项

来源
《International Congress on Advanced Applied Informatics》|2019年|1023-1024|共2页
会议地点
作者
Yusuke Tozaki; Takahiko Suzuki; Tsunenori Mine; Sachio Hirokawa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
data mining; educational administrative data processing; educational institutions; feature extraction; search engines; statistical distributions; text analysis;

机译：数据挖掘;教育行政数据处理;教育机构;特征提取;搜索引擎;统计分布;文本分析;

相似文献

外文文献
中文文献
专利

1. Using Benford’s law to investigate Natural Hazard dataset homogeneity [J] . Renaud Joannes-Boyau, Thomas Bodin, Anja Scheffers, Scientific reports. . 2015,第1期

机译：使用本福德定律研究自然灾害数据集的同质性
2. Text mining datasets of β-hydroxybutyrate (BHB) supplement products’ consumer online reviews [J] . Ji Li, Dan Lowe, Luke Wayment, Data in Brief . 2020,第2期

机译：β-羟基丁酸酯（BHB）补充产品的文本挖掘数据集
3. Data Mining Research with In-copyright and Use-limited Text Datasets: Preliminary Findings from a Systematic Literature Review and Stakeholder Interviews [J] . Megan Senseney, Eleanor Dickson, Beth Namachchivaya, International Journal of Digital Curation . 2018,第1期

机译：具有版权和使用受限文本数据集的数据挖掘研究：系统文献综述和利益相关者访谈的初步结果
4. Extracting Irregular Datasets in University Admission Statistics using Text Mining and Benford's Law [C] . Yusuke Tozaki, Takahiko Suzuki, Tsunenori Mine, International Congress on Advanced Applied Informatics . 2019

机译：用文本挖掘和本福德法律提取大学入学统计数据集的不规则数据集
5. Scaling the Technology Opportunity Analysis text data mining methodology: Data extraction, cleaning, online analytical processing analysis, and reporting of large multi-source datasets. [D] . George, Richard Peyton. 2006

机译：扩展技术机会分析文本数据挖掘方法：数据提取，清理，在线分析处理分析以及大型多源数据集的报告。
6. Using Benford’s law to investigate Natural Hazard dataset homogeneity [O] . Renaud Joannes-Boyau, Thomas Bodin, Anja Scheffers, -1

机译：使用本福德定律研究自然灾害数据集的同质性
7. Using Benford’s law to investigate Natural Hazard dataset homogeneity [O] . Joannes-Boyau, Renaud, Bodin, Thomas, Scheffers, Anja, 2015

机译：使用本福德定律研究自然灾害数据集的同质性

Extracting Irregular Datasets in University Admission Statistics using Text Mining and Benford's Law

摘要

著录项

相似文献

相关主题

期刊订阅