Mining multiple informational text structure from text data

Syaamantak Das; Shyamal Kumar Das Mandal; Anupam Basu

首页> 外文期刊>Procedia Computer Science >Mining multiple informational text structure from text data

【24h】

Mining multiple informational text structure from text data

机译：从文本数据中挖掘多种信息文本结构

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This study aimed to distinguish the various types of informational text structure present in the text data. Classification of informational text structure in a given text is an essential area of research for discovering knowledge present in the text content. Several previous studies defined a set of categories of informational text structure which can be identified based on their respective signal words. The paper proposed a methodology for automatic extraction of those text informational structures from school textbook data. The task was to classify a text into one or more of the given predefined categories. Human annotators have performed the categorization, who have sufficient domain knowledge about the subjects of the book. For automatic classification, the occurrence frequency of the signal words was used as a feature vector. A Na?ve Bayes based classifier was trained using 120 manually annotated text data. Forty text data was used to test the classifier. The classifier had a precision rate of 92% and F1 score of 95.6%.

机译：本研究旨在区分文本数据中存在的各种类型的信息文本结构。给定文本中的信息文本结构的分类是发现文本内容中存在的知识的重要领域。几个先前的研究定义了一组类别的信息文本结构，可以基于它们各自的信号字来识别。本文提出了一种从学校教科书数据自动提取这些文本信息结构的方法。任务是将文本分为一个或多个给定的预定义类别。人类的注册人已执行分类，他们有足够的域名知识了关于本书主题的域名知识。对于自动分类，信号词的发生频率用作特征向量。使用120手动注释的文本数据培训基于贝母的分类器。使用40个文本数据来测试分类器。分类器的精确度为92％，F1得分为95.6％。

著录项

来源
《Procedia Computer Science》 |2020年第5期|共10页
作者
Syaamantak Das; Shyamal Kumar Das Mandal; Anupam Basu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Text structuresignal wordautomatic classificationcognitive level;

机译：文本结构性WordAutom自动分类认知水平;

相似文献

外文文献
中文文献
专利

1. Practical text mining and statistical analysis for non-structured text data applications [J] . Radu State Computing reviews . 2014,第9期

机译：适用于非结构化文本数据应用程序的实用文本挖掘和统计分析
2. Proposed Architecture for Automatic Conversion of Unstructured Text Data into Structured Text Data on the Web [J] . CH.Madhusudhan, K.Mrithyunjaya Rao International journal of computer science and network security . 2013,第12期

机译：在网络上将非结构化文本数据自动转换为结构化文本数据的建议体系结构
3. AN ONTOLOGY TEXT MINING TO CONVERSION OF UNSTRUCTURED TO STRUCTURE TEXT IN D-MATRIX [J] . RADHIKAY.DEORE Indian Journal of Scientific Research . 2015,第1期

机译：D-矩阵中将非结构化文本转换为结构文本的本体文本挖掘
4. A Text Preprocessing Framework for Text Mining on Big Data Infrastructure [C] . Watcharaporn Sriyanong, Nunnapus Moungmingsuk, Nattawat Khamphakdee International Conference on Imaging, Signal Processing and Communication . 2018

机译：大数据基础架构上用于文本挖掘的文本预处理框架
5. "Are you gonna make us read outta the book this year Mr. Gilder?" The effects of teaching text structure on reading comprehension of informational texts. [D] . Gilder, Jason. 2009

机译：“今年吉尔德先生，你会让我们读这本书吗？”教学文本结构对信息文本阅读理解的影响。
6. METSP: A Maximum-Entropy Classifier Based Text Mining Tool for Transporter-Substrate Identification with Semistructured Text [O] . Min Zhao, Yanming Chen, Dacheng Qu, -1

机译：METSP：基于最大熵分类器的文本挖掘工具用于半结构化文本的转运体-基质识别
7. Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining [O] . JeeHee Lee, June-Seong Yi 2017

机译：通过使用文本挖掘将非结构化文本数据和结构化的数值数据集成，预测项目的不确定性风险
8. Science and Technology Text Mining: Text Mining of the Journal Cortex [R] . Kostoff, R. N. , Buchtel, H. A. , Andrews, J. , 2004

机译：科技文本挖掘：期刊皮质的文本挖掘

Mining multiple informational text structure from text data

摘要

著录项

相似文献

相关主题

期刊订阅