首页> 外文期刊>Computing and informatics >APPROACHES TO SAMPLES SELECTION FOR MACHINE LEARNING BASED CLASSIFICATION OF TEXTUAL DATA
【24h】

APPROACHES TO SAMPLES SELECTION FOR MACHINE LEARNING BASED CLASSIFICATION OF TEXTUAL DATA

机译:基于机器学习的文本数据分类的样本选择方法

获取原文
获取原文并翻译 | 示例

摘要

The paper focuses on the process of selecting representative sample documents written in a natural language that can be used as the basis for automatic selection or classification of textual documents. A method of selecting the examples from a larger set of candidate examples, called automatic biased sample selection, is compared to random and manual selection. The methods are evaluated by experiments carried out with real world data consisting of customer reviews, with different document representations and similarity measures. The presented approach, that provided satisfactory results, faces problems related to processing user created content and huge computational complexity and can be used as an alternative to manual selection and evaluation of textual samples.
机译:本文着重于选择以自然语言编写的代表性样本文档的过程,该文档可用作自动选择或分类文本文档的基础。从较大的一组候选样本中选择样本的方法(称为自动有偏样本选择)与随机和手动选择进行了比较。这些方法是通过对包含客户评论,具有不同文档表示形式和相似性度量的真实数据进行的实验进行评估的。所提供的方法提供了令人满意的结果,面临着与处理用户创建的内容和巨大的计算复杂性相关的问题,可以用作手动选择和评估文本样本的替代方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号