首页> 外文会议>International Computer Engineering Conference >Ontology Learning Based on Word Embeddings for Text Big Data Extraction
【24h】

Ontology Learning Based on Word Embeddings for Text Big Data Extraction

机译:基于Word Embeddings的本体学习文本大数据提取

获取原文

摘要

Big Data term describes data that exists everywhere in humongous volumes, raw forms, and heterogenous types. Unstructured and uncategorized data forms 95% of big data. Text big data lacks to efficiently extract domain-relevant data in a suitable time. Thus, text big data stills a barrier for big data integration and subsequently big data analytics. Because big data integration can't consider text big data in its process of preparing data for big data analytics. On the other side, ontology represents information and knowledge in a graph schema that provides a shareable, reusing and domain-specific data. Thus, ontology fits text big data needs of extracting domain relevant data. So, this paper proposes an ontology learning (OL) methodology for text big data extraction. OL aims to provides algorithms, techniques, and tools for automatic ontology construction from the text. The proposed OL method exploits a deep learning approach i.e., word embeddings, and advanced hierarchical clustering i.e., BIRCH. The utilization of the word embeddings and the advanced hierarchical clustering improve OL quality in text big data extraction and reduce the processing time. Also, deep learning unsupervisory learns from a massive amount of unlabeled and uncategorized raw data. This great big benefit solves analytical challenge of the text big data. In evaluation, precision, recall, and f - value for the work quality and the running time for performance are measured. The quality of work is evaluated by comparing its results with gold standard datasets results. Experimental results and evaluation demonstrate that the proposed OL methodology efficiently suitable for text big data extraction.
机译:大数据术语描述了在跨性卷,原始形式和异因类型中存在的数据。非结构化和未分类的数据形成95 %的大数据。文本大数据缺乏在合适的时间内有效提取域相关数据。因此,文本大数据仍然是大数据集成以及随后的大数据分析的障碍。因为大数据集成无法考虑在准备大数据分析数据的过程中的文本大数据。另一方面,本体代表了图形模式中的信息和知识,该图形模式提供了可共享,重用和特定于域的数据。因此,本体符合提取域相关数据的文本大数据需求。因此,本文提出了一种文本大数据提取的本体学习(OL)方法。 OL旨在为从文本中提供自动本体建设的算法,技术和工具。所提出的OL方法利用深度学习方法i.E.,Word Embeddings和Advanced分层聚类即,桦木。使用单词嵌入式和高级分层聚类的利用提高了文本大数据提取中的OL质量并降低了处理时间。此外,深入学习无审慎了解了大量的未标记和未分类的原始数据。这种大大效益解决了文本大数据的分析挑战。在评估,精度,召回和工作质量的值和衡量时间的运行时间。通过将其结果与黄金标准数据集结果进行比较来评估工作质量。实验结果和评价表明,所提出的OL方法有效适合文本大数据提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号