Ontology Learning Based on Word Embeddings for Text Big Data Extraction

机译：基于词嵌入的本体学习用于文本大数据提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Big Data term describes data that exists everywhere in humongous volumes, raw forms, and heterogenous types. Unstructured and uncategorized data forms 95% of big data. Text big data lacks to efficiently extract domain-relevant data in a suitable time. Thus, text big data stills a barrier for big data integration and subsequently big data analytics. Because big data integration can't consider text big data in its process of preparing data for big data analytics. On the other side, ontology represents information and knowledge in a graph schema that provides a shareable, reusing and domain-specific data. Thus, ontology fits text big data needs of extracting domain relevant data. So, this paper proposes an ontology learning (OL) methodology for text big data extraction. OL aims to provides algorithms, techniques, and tools for automatic ontology construction from the text. The proposed OL method exploits a deep learning approach i.e., word embeddings, and advanced hierarchical clustering i.e., BIRCH. The utilization of the word embeddings and the advanced hierarchical clustering improve OL quality in text big data extraction and reduce the processing time. Also, deep learning unsupervisory learns from a massive amount of unlabeled and uncategorized raw data. This great big benefit solves analytical challenge of the text big data. In evaluation, precision, recall, and f - value for the work quality and the running time for performance are measured. The quality of work is evaluated by comparing its results with gold standard datasets results. Experimental results and evaluation demonstrate that the proposed OL methodology efficiently suitable for text big data extraction.

机译：大数据一词描述了以无数卷，原始格式和异构类型无处不在的数据。非结构化和未分类的数据构成大数据的95％。文本大数据缺乏在合适的时间内有效提取与领域相关的数据的能力。因此，文本大数据仍然是大数据集成以及随后的大数据分析的障碍。因为大数据集成在为大数据分析准备数据的过程中不能考虑文本大数据。另一方面，本体以图模式表示信息和知识，该图模式提供可共享的，可重用的和特定于域的数据。因此，本体满足了提取领域相关数据的文本大数据需求。因此，本文提出了一种用于文本大数据提取的本体学习（OL）方法。 OL的目的是提供用于从文本自动构建本体的算法，技术和工具。提出的OL方法利用了深度学习方法（即单词嵌入）和高级层次聚类（即BIRCH）。利用单词嵌入和先进的层次聚类可以提高文本大数据提取中的OL质量，并减少处理时间。此外，深度学习无监督可从大量未标记和未分类的原始数据中学习。巨大的好处解决了文本大数据的分析挑战。在评估中，将测量工作质量的精度，召回率和f值，以及性能的运行时间。通过将其结果与黄金标准数据集结果进行比较来评估工作质量。实验结果和评估表明，提出的OL方法有效地适合于文本大数据提取。

著录项

来源
《International Computer Engineering Conference》|2018年|183-188|共6页
会议地点
作者
Nesma Mahmoud; Heba Elbeh; Hatem M. Abdlkader;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Big Data; Ontologies; Data mining; Clustering algorithms; Task analysis; Buildings; Deep learning;

机译：大数据;本体;数据挖掘;聚类算法;任务分析;建筑物;深度学习;

相似文献

外文文献
中文文献
专利

1. Clinical Information Extraction Using Small Data: An Active Learning Approach Based on Sequence Representations and Word Embeddings [J] . Mahnoosh Kholghi, Lance De Vine, Laurianne Sitbon, Journal of the American Society for Information Science and Technology . 2017,第11期

机译：利用小数据提取临床信息：一种基于序列表示和词嵌入的主动学习方法
2. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [J] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Data in Brief . 2019,第1期

机译：用于单词嵌入的大型实验调查的可再现性数据集，以及基于本体的单词相似性方法
3. Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text [J] . Sadam Al-Azani, El-Sayed M. El-Alfy Procedia Computer Science . 2017,第1期

机译：使用词嵌入和集成学习在阿拉伯语短文本中高度不平衡的数据情感分析
4. Ontology Learning Based on Word Embeddings for Text Big Data Extraction [C] . Nesma Mahmoud, Heba Elbeh, Hatem M. Abdlkader International Computer Engineering Conference . 2018

机译：基于Word Embeddings的本体学习文本大数据提取
5. Scalable Detection and Extraction of Data in Lists in OCRed Text for Ontology Population Using Semi-Supervised and Unsupervised Active Wrapper Induction. [D] . Packer, Thomas L. 2014

机译：使用半监督和无监督主动包装诱导，可扩展地检测和提取OCRed文本中本体列表中的数据。
6. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [O] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, 2019

机译：用于单词嵌入的大型实验调查的可重复性数据集以及基于本体的单词相似性方法
7. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [O] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, 2019

机译：用于Word eMbeddings的大型实验调查的再现性数据集和基于本体的词汇方法
8. Ontology-Based Information Extraction from Free-Form Text [R] . Braun, R. 2000

机译：基于本体的自由格式文本信息抽取

Ontology Learning Based on Word Embeddings for Text Big Data Extraction

摘要

著录项

相似文献

相关主题

期刊订阅