An Unsupervised Approach for Precise Context Identification from Unstructured Text Documents

机译：从非结构化文本文件中精确上下文识别的无监督方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The majority of the documents produced and exchanged through medias and social networks are unstructured. Due to the amount of these unstructured documents on the Web, their exploitation represents a tedious or even impossible task for human beings without assistance by dedicated algorithms and specialized computer systems in document classification or information extraction. To be efficient and relevant, such systems have to understand the content of these unstructured documents. The context (or topic) of a document is one of the basic information essential for the understanding of its content, and the more precise the context of a document, the more relevant its understanding will be. This paper presents a precise context identification approach that is evaluated quantitatively and qualitatively on several reference corpora and compared to other context identification systems. The contexts identified by our model are much more precise than those identified by these others systems.

机译：通过媒体和社交网络制作和交换的大部分文件都是非结构化的。由于网络上的这些非结构化文件的数量，他们的开发代表了人类的乏味甚至不可能的任务，没有专用算法和文档分类或信息提取的专业计算机系统的帮助。为了高效和相关，这些系统必须了解这些非结构化文件的内容。文档的上下文（或主题）是对其内容的理解所必需的基本信息之一，并且文档的上下文更准确，其理解越多。本文提出了一种精确的上下文识别方法，其在几个参考数集上定量和定性地评估并与其他上下文识别系统进行比较。我们模型所识别的上下文比这些其他系统识别的更精确。

著录项

来源
《IEEE International Conference on Tools with Artificial Intelligence》|2020年|821-826|共6页
会议地点
作者
Maha Mallek; Sébastien Fournier; Ramzi Guetari; Bernard Espinasse; Wided Lejouad Chaari;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Social networking (online); Encyclopedias; Tools; Media; Classification algorithms; Internet; Task analysis;

机译：社交网络（在线）;百科全书;工具;媒体;分类算法;互联网;任务分析;

相似文献

外文文献
中文文献
专利

1. Correlative study and analysis for hidden patterns in text analytics unstructured data using supervised and unsupervised learning techniques [J] . E. Laxmi Lydia, S. Kannan, S. SumanRajest, International Journal of Cloud Computing . 2020,第2a3期

机译：文本分析非结构化数据中隐藏模式的相关研究与分析，使用监督和无监督学习技术
2. A language-independent authorship attribution approach for author identification of text documents [J] . Ramezani Reza Expert systems with applications . 2021,第Octa期

机译：作者识别文本文件的语言无关的作者归因方法
3. Building a Construction Project Key-Phrase Network from Unstructured Text Documents [J] . Nedeljkovic Dorde, Kovacevic Milos Journal of Computing in Civil Engineering . 2017,第6期

机译：从非结构化文本文档构建建设项目关键短语网络
4. Recommendation-Based De-identification: A Practical Systems Approach towards De-identification of Unstructured Text in Healthcare [C] . Bhagwan Varun, Grandison Tyrone, Maltzahn Carlos 2012 IEEE Eighth World Congress on Services . 2012

机译：基于建议的去识别：一种用于医疗保健中非结构化文本去识别的实用系统方法
5. Automatic Detection of Section Title and Prose Text in HTML Documents Using Unsupervised and Supervised Learning [D] . Mysore Gopinath, Abhijith Athreya 2018

机译：使用无监督和有监督的学习自动检测HTML文档中的节标题和散文
6. Text Snippets to Corroborate Medical Relations: An Unsupervised Approach using a Knowledge Graph and Embeddings [O] . Maulik R. Kamdar, Craig E. Stanley, Michael Carroll, 2020

机译：文本片段以证实医学关系：使用知识图和嵌入的无监督方法
7. Unsupervised cross-modal audio representation learning from unstructured multilingual text [O] . Alexander Schindler, Sergiu Gordea, Peter Knees 2020

机译：从非结构化的多语言文本学习无监督的跨模型音频表示

An Unsupervised Approach for Precise Context Identification from Unstructured Text Documents

摘要

著录项

相似文献

相关主题

期刊订阅