Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark

Do Phuc; Phan Trung; Le HungGupta Brij B.

首页> 外文期刊>Neural computing & applications >Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark

【24h】

Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark

机译：Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

The simplest and effective way to store human knowledge through centuries was using text. Along with the advancement of technology nowadays, the volume of text has grown to be larger and larger. To extract useful information from this amount of text becomes an exceptionally complex task. As an effort to solve that problem, in this paper, we present a pipeline to extract core knowledge from large quantity text using distributed computing. The components of our pipeline are systems that were known to yield good results. The outputs of our proposed system are stored in a knowledge graph. A knowledge graph is a graph for storing knowledge in the form of triples (head, relation, tail). Some of the existing knowledge graphs in the world are Google knowledge graph, YAGO, DBLP, or DBpedia. These knowledge graphs have one thing in common-they are in English. The English language is studied by many researchers in the world and it had become a rich-resource language (with many natural language processing tools and data set). Vietnamese, on the other hand, is a low-resource language. Therefore, we use cross-lingual transfer method to build a Vietnamese knowledge graph. Firstly, we collect data in form of text about Vietnam tourism, which was written mostly in Vietnamese, using Google search and Wikipedia. In the next step, we translate them into English with Google Translate and use English Natural Language Processing tools like Stanford Parser, Co-referencing, ClausIE, MinIE to extract useful triples from this text. Lastly, the triples are translated back to Vietnamese to build a Vietnam tourism knowledge graph. Since we are working with massive text, we develop a distributed algorithm to extract triples from sentences of massive text. This is a distributed version of MinIE, which was originally developed for a single machine model. In Apache Spark framework, we divide massive text into many smaller parts and move them to the worker nodes with distributed MinIE function. Spark distributed MinIE will extract the triples of sentences in the local text of this worker node in parallel. Finally, the result of worker nodes will be sent back to the master node for building the knowledge graph. We conduct experiments with the distributed MinIE on spark cluster to prove the outperformance of our proposed algorithm.

著录项

来源
《Neural computing & applications》 |2022年第11期|8393-8409|共17页
作者
Do Phuc; Phan Trung; Le HungGupta Brij B.;
展开▼
作者单位

Vietnam Natl Univ;

Natl Inst Technol;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类人工神经网络计算机;人工智能理论;
关键词
Knowledge graph; Cross-lingual transfer method; Distributed MinIE; Natural language processing; Triples extraction;

Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark

摘要

著录项

相关主题

期刊订阅