首页> 外文期刊>Computer speech and language >Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits
【24h】

Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits

机译:说话人喜好,清晰度和人格特质的高维分类中的特征选择方法及其组合

获取原文
获取原文并翻译 | 示例

摘要

This study focuses on feature selection in paralinguistic analysis and presents recently developed supervised and unsupervised methods for feature subset selection and feature ranking. Using the standard k-nearest-neighbors (kNN) rule as the classification algorithm, the feature selection methods are evaluated individually and in different combinations in seven paralinguistic speaker trait classification tasks. In each analyzed data set, the overall number of features highly exceeds the number of data points available for training and evaluation, making a well-generalizing feature selection process extremely difficult. The performance of feature sets on the feature selection data is observed to be a poor indicator of their performance on unseen data. The studied feature selection methods clearly outperform a standard greedy hill-climbing selection algorithm by being more robust against overfitting. When the selection methods are suitably combined with each other, the performance in the classification task can be further improved. In general, it is shown that the use of automatic feature selection in paralinguistic analysis can be used to reduce the overall number of features to a fraction of the original feature set size while still achieving a comparable or even better performance than baseline support vector machine or random forest classifiers using the full feature set. The most typically selected features for recognition of speaker likability, intelligibility and five personality traits are also reported.
机译:这项研究侧重于副语言分析中的特征选择,并提出了最近开发的用于特征子集选择和特征排名的有监督和无监督方法。使用标准的k最近邻(kNN)规则作为分类算法,在七个副语言说话者特征分类任务中分别对特征选择方法进行了评估,并以不同的组合进行了评估。在每个分析的数据集中,特征的总数大大超过了可用于训练和评估的数据点的数目,这使得很难很好地概括特征选择过程。观察到特征集在特征选择数据上的性能不能很好地指示其在看不见的数据上的性能。所研究的特征选择方法具有更强的抗过度拟合能力,明显优于标准的贪婪爬山选择算法。当选择方法彼此适当地组合时,可以进一步提高分类任务的性能。总的来说,表明在语言分析中使用自动特征选择可将特征总数减少到原始特征集大小的一小部分,同时仍可实现比基线支持向量机或同等甚至更好的性能。使用完整功能集的随机森林分类器。还报告了用于识别说话者的喜好,清晰度和五个人格特质的最典型选择的功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号