...
首页> 外文期刊>Procedia Computer Science >Mining multiple informational text structure from text data
【24h】

Mining multiple informational text structure from text data

机译:从文本数据中挖掘多种信息文本结构

获取原文
           

摘要

This study aimed to distinguish the various types of informational text structure present in the text data. Classification of informational text structure in a given text is an essential area of research for discovering knowledge present in the text content. Several previous studies defined a set of categories of informational text structure which can be identified based on their respective signal words. The paper proposed a methodology for automatic extraction of those text informational structures from school textbook data. The task was to classify a text into one or more of the given predefined categories. Human annotators have performed the categorization, who have sufficient domain knowledge about the subjects of the book. For automatic classification, the occurrence frequency of the signal words was used as a feature vector. A Na?ve Bayes based classifier was trained using 120 manually annotated text data. Forty text data was used to test the classifier. The classifier had a precision rate of 92% and F1 score of 95.6%.
机译:本研究旨在区分文本数据中存在的各种类型的信息文本结构。给定文本中的信息文本结构的分类是发现文本内容中存在的知识的重要领域。几个先前的研究定义了一组类别的信息文本结构,可以基于它们各自的信号字来识别。本文提出了一种从学校教科书数据自动提取这些文本信息结构的方法。任务是将文本分为一个或多个给定的预定义类别。人类的注册人已执行分类,他们有足够的域名知识了关于本书主题的域名知识。对于自动分类,信号词的发生频率用作特征向量。使用120手动注释的文本数据培训基于贝母的分类器。使用40个文本数据来测试分类器。分类器的精确度为92%,F1得分为95.6%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号