首页> 外文会议>International conference on statistical language and speech processing >Can Statistical Tests Be Used for Feature Selection in Diachronic Text Classification?
【24h】

Can Statistical Tests Be Used for Feature Selection in Diachronic Text Classification?

机译:统计测试可以用于历时文本分类中的特征选择吗?

获取原文

摘要

In spite of the great number of diachronic studies in various languages, the methodology for investigating language change has not evolved much in the last fifty years. Following the progressive trends in other fields, in this paper, we argue for the adoption of a machine learning approach in diachronic studies, which could offer a more efficient analysis of a large number of features and easier comparison of the results across different genres, languages and language varieties. We suggest the use of statistical tests as an initial step for feature selection in an approach which uses the F-measure of the classification algorithms as a measure of the extent of diachronic changes. Furthermore, we compare the performance of the classification task after the feature selection made by statistical tests and the CfsSubsetEval attribute selection algorithm. The experiments were conducted on the British part of the biggest existing diachronic corpora of 20th century written English language - the 'Brown family' of corpora, using 23 different stylistic features. The results demonstrated that the use of the statistical tests for feature selection can significantly increase the accuracy of the classification algorithms.
机译:尽管进行了多种语言的历时研究,但在过去的五十年中,用于调查语言变化的方法并未得到很大发展。跟随其他领域的发展趋势,在本文中,我们主张在历时性研究中采用机器学习方法,该方法可以对大量功能进行更有效的分析,并且可以更轻松地比较不同类型,语言的结果和语言种类。我们建议在将分类算法的F度量用作历时变化程度的度量的方法中,将统计测试用作特征选择的第一步。此外,我们比较了统计测试和CfsSubsetEval属性选择算法进行的特征选择后分类任务的性能。实验是使用20种不同的文体特征在20世纪现有的最大历时英语语料库的英语部分-语料库的“布朗家族”中进行的。结果表明,使用统计测试进行特征选择可以显着提高分类算法的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号