首页> 外文会议>SLSP 2013 >Can Statistical Tests Be Used for Feature Selection in Diachronic Text Classification?
【24h】

Can Statistical Tests Be Used for Feature Selection in Diachronic Text Classification?

机译:统计测试可用于历时文本分类中的特征选择吗?

获取原文

摘要

In spite of the great number of diachronic studies in various languages, the methodology for investigating language change has not evolved much in the last fifty years. Following the progressive trends in other fields, in this paper, we argue for the adoption of a machine learning approach in diachronic studies, which could offer a more efficient analysis of a large number of features and easier comparison of the results across different genres, languages and language varieties. We suggest the use of statistical tests as an initial step for feature selection in an approach which uses the F-measure of the classification algorithms as a measure of the extent of diachronic changes. Furthermore, we compare the performance of the classification task after the feature selection made by statistical tests and the CfsSubsetEval attribute selection algorithm. The experiments were conducted on the British part of the biggest existing diachronic corpora of 20th century written English language - the ‘Brown family’ of corpora, using 23 different stylistic features. The results demonstrated that the use of the statistical tests for feature selection can significantly increase the accuracy of the classification algorithms.
机译:尽管各种语言的历史学习大量历史,但在过去的五十年中,调查语言变革的方法尚未进化。在其他领域的进步趋势之后,在本文中,我们争辩于在历时研究中采用机器学习方法,可以更有效地分析大量特征,更容易比较不同类型的结果,语言和语言品种。我们建议使用统计测试作为特征选择的初始步骤,用于使用分类算法的F-Meashms作为历转变化程度的测量。此外,我们在统计测试和CFSSUBSETEVAL属性选择算法的特征选择之后比较分类任务的性能。该实验是在20世纪的最大现有的英语语言中最大的现有历史演奏中的一部分进行的实验 - Corpora的“布朗家庭”,使用23种不同的风格特征。结果表明,使用特征选择的统计测试可以显着提高分类算法的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号