首页> 外文会议>International Congress on Advanced Applied Informatics >Extracting Irregular Datasets in University Admission Statistics using Text Mining and Benford's Law
【24h】

Extracting Irregular Datasets in University Admission Statistics using Text Mining and Benford's Law

机译:使用文本挖掘和本福德定律提取大学录取统计数据中的不规则数据集

获取原文

摘要

It is known as Benford's law that the distribution of the first digits forms a specific shape for natural numerical datasets. Deviation from the Benford's distribution indicates the irregularity of the dataset. However, it does not tell any clue to interpret the reason of irregularity. The present paper constructs a search engine of cells that appear in tables by correlating a cell with the words in the title of row or column or in the explanation of the table. We generate an exhaustive dataset of cells for testing irregularity by enumerating the search conditions. We applied the method to the number of applicants, the number of candidates, and the number of successful applicants in each department of 565 private universities in Japan. We confirmed the effectiveness of the proposed method by extracting the characteristics of the irregular datasets.
机译:众所周知的本福德定律是,第一位数字的分布形成了自然数值数据集的特定形状。与Benford分布的偏差表示数据集的不规则性。但是,它并没有提供任何线索来解释违规的原因。本文通过将单元格与行标题或列标题中或表说明中的单词相关联,来构造表中出现的单元格的搜索引擎。我们通过枚举搜索条件生成了一个详尽的细胞数据集,用于测试不规则性。我们将该方法应用于日本565所私立大学的每个系中的申请者数量,候选人数量和成功申请者数量。我们通过提取不规则数据集的特征证实了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号