Ontology Learning Based on Word Embeddings for Text Big Data Extraction

机译：基于Word Embeddings的本体学习文本大数据提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Big Data term describes data that exists everywhere in humongous volumes, raw forms, and heterogenous types. Unstructured and uncategorized data forms 95% of big data. Text big data lacks to efficiently extract domain-relevant data in a suitable time. Thus, text big data stills a barrier for big data integration and subsequently big data analytics. Because big data integration can't consider text big data in its process of preparing data for big data analytics. On the other side, ontology represents information and knowledge in a graph schema that provides a shareable, reusing and domain-specific data. Thus, ontology fits text big data needs of extracting domain relevant data. So, this paper proposes an ontology learning (OL) methodology for text big data extraction. OL aims to provides algorithms, techniques, and tools for automatic ontology construction from the text. The proposed OL method exploits a deep learning approach i.e., word embeddings, and advanced hierarchical clustering i.e., BIRCH. The utilization of the word embeddings and the advanced hierarchical clustering improve OL quality in text big data extraction and reduce the processing time. Also, deep learning unsupervisory learns from a massive amount of unlabeled and uncategorized raw data. This great big benefit solves analytical challenge of the text big data. In evaluation, precision, recall, and f - value for the work quality and the running time for performance are measured. The quality of work is evaluated by comparing its results with gold standard datasets results. Experimental results and evaluation demonstrate that the proposed OL methodology efficiently suitable for text big data extraction.

机译：大数据术语描述了在跨性卷，原始形式和异因类型中存在的数据。非结构化和未分类的数据形成95 ％的大数据。文本大数据缺乏在合适的时间内有效提取域相关数据。因此，文本大数据仍然是大数据集成以及随后的大数据分析的障碍。因为大数据集成无法考虑在准备大数据分析数据的过程中的文本大数据。另一方面，本体代表了图形模式中的信息和知识，该图形模式提供了可共享，重用和特定于域的数据。因此，本体符合提取域相关数据的文本大数据需求。因此，本文提出了一种文本大数据提取的本体学习（OL）方法。 OL旨在为从文本中提供自动本体建设的算法，技术和工具。所提出的OL方法利用深度学习方法i.E.，Word Embeddings和Advanced分层聚类即，桦木。使用单词嵌入式和高级分层聚类的利用提高了文本大数据提取中的OL质量并降低了处理时间。此外，深入学习无审慎了解了大量的未标记和未分类的原始数据。这种大大效益解决了文本大数据的分析挑战。在评估，精度，召回和工作质量的值和衡量时间的运行时间。通过将其结果与黄金标准数据集结果进行比较来评估工作质量。实验结果和评价表明，所提出的OL方法有效适合文本大数据提取。

著录项

来源
《International Computer Engineering Conference》|2018年|vii 265 p. :|共6页
会议地点
作者
Nesma Mahmoud; Heba Elbeh; Hatem M. Abdlkader;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Big Data; Ontologies; Data mining; Clustering algorithms; Task analysis; Buildings; Deep learning;

机译：大数据;本体;数据挖掘;聚类算法;任务分析;建筑物;深度学习;

相似文献

外文文献
中文文献
专利

1. Clinical Information Extraction Using Small Data: An Active Learning Approach Based on Sequence Representations and Word Embeddings [J] . Mahnoosh Kholghi, Lance De Vine, Laurianne Sitbon, Journal of the American Society for Information Science and Technology . 2017,第11期

机译：利用小数据提取临床信息：一种基于序列表示和词嵌入的主动学习方法
2. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [J] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Data in Brief . 2019,第1期

机译：用于单词嵌入的大型实验调查的可再现性数据集，以及基于本体的单词相似性方法
3. Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text [J] . Sadam Al-Azani, El-Sayed M. El-Alfy Procedia Computer Science . 2017,第1期

机译：使用词嵌入和集成学习在阿拉伯语短文本中高度不平衡的数据情感分析
4. Ontology Learning Based on Word Embeddings for Text Big Data Extraction [C] . Nesma Mahmoud, Heba Elbeh, Hatem M. Abdlkader International Computer Engineering Conference . 2018

机译：基于词嵌入的本体学习用于文本大数据提取
5. Scalable Detection and Extraction of Data in Lists in OCRed Text for Ontology Population Using Semi-Supervised and Unsupervised Active Wrapper Induction. [D] . Packer, Thomas L. 2014

机译：使用半监督和无监督主动包装诱导，可扩展地检测和提取OCRed文本中本体列表中的数据。
6. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [O] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, 2019

机译：用于单词嵌入的大型实验调查的可重复性数据集以及基于本体的单词相似性方法
7. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [O] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, 2019

机译：用于Word eMbeddings的大型实验调查的再现性数据集和基于本体的词汇方法
8. Ontology-Based Information Extraction from Free-Form Text [R] . Braun, R. 2000

机译：基于本体的自由格式文本信息抽取

Ontology Learning Based on Word Embeddings for Text Big Data Extraction

摘要

著录项

相似文献

相关主题

期刊订阅