首页> 外文期刊>Neural computing & applications >A subjectivity classification framework for sports articles using improved cortical algorithms
【24h】

A subjectivity classification framework for sports articles using improved cortical algorithms

机译:使用改进的皮质算法进行体育文章的主体性分类框架

获取原文
获取原文并翻译 | 示例
           

摘要

The enormous number of articles published daily on the Internet, by a diverse array of authors, often offers misleading or unwanted information, rendering activities such as sports betting riskier. As a result, extracting meaningful and reliable information from these sources becomes a time-consuming and near impossible task. In this context, labeling articles as objective or subjective is not a simple natural language processing task because subjectivity can take several forms. With the rise of online sports betting due to the revolution in Internet and mobile technology, an automated system capable of sifting through all these data and finding relevant sources in a reasonable amount of time presents itself as a desirable and marketable product. In this work, we present a framework for the classification of sports articles composed of three stages: The first stage extracts articles from web pages using text extraction libraries, parses the text and then tags words using Stanford's parts of speech tagger; the second stage extracts unique syntactic and semantic features, and reduces them using our modified cortical algorithm (CA)-hereafter CA*-while the third stage classifies these texts as objective or subjective. Our framework was tested on a database containing 1000 articles, manually labeled using Amazon's crowdsourcing tool, Mechanical Turk; and results using CA, CA*, support vector machines and one of its soft computing variants (LMSVM) as classifiers were reported. A testing accuracy of 85.6% was achieved on a fourfold cross-validation with a 40% reduction in features using CA* that was trained using an entropy weight update rule and a cross-entropy cost function.
机译:通过多样化的作者在互联网上每天发布的巨大文章通常提供误导或不需要的信息,渲染体育投注风险的活动。结果,从这些来源提取有意义和可靠的信息变得耗时,靠近不可能的任务。在此上下文中,标记文章作为目标或主观性不是简单的自然语言处理任务,因为主体性可以采用多种形式。随着在线体育投注的兴起,由于互联网和移动技术的革命,一种能够在合理的时间内筛选所有这些数据的自动化系统,并以合理的时间找到相关来源,将其自身作为一种理想和可销售的产品。在这项工作中,我们为三个阶段组成的体育文章分类框架:第一个阶段使用文本提取库从网页提取文章,解析文本,然后使用斯坦福的言语标签的部分标记单词;第二阶段提取唯一的句法和语义特征,并使用我们的修改皮质算法(CA) - levter CA * - 当第三阶段将这些文本分类为目标或主观性。我们的框架在包含1000篇文章的数据库上测试,使用亚马逊的众群工具手动标记,机械土耳其人;报告了使用CA,CA *,支持向量机和其软计算变体(LMSVM)作为分类器的结果。在四倍交叉验证上实现了85.6%的测试精度,使用使用熵权重新更新规则和跨熵成本函数训练的CA *减少了40%的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号