An unsupervised approach for identifying the Infobox template of wikipedia article

机译：一种识别维基百科文章信息框模板的无人监督方法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Wikipedia infoboxes serve as important structured information source in the web. To author infobox for a particular article, volunteers required a considerable amount of manual effort to identify the respective infobox template. Thus, an automatic process to mark infobox template might be useful and beneficial for the Wikipedia contributors. In this paper, we present a Natural Language Processing (NLP)-based automated approach to identify the infobox template in an unsupervised fashion. The proposed approach has been developed by using semantic relations (hyponym and holonym) and word features of Wikipedia articles. Our approach works in three steps: first it processes the raw text of the article to generate sets of words, next it apply the proposed algorithm to identify the infobox type and finally point out the infobox template from the large pool of template list. The effectiveness of the proposed approach has been proved in terms of autonomous and accuracy, by a data-driven experiment.

机译：维基百科信息框在Web中作为重要的结构化信息源。要向某些特定文章作者，志愿者需要相当多的手动努力来识别各自的InfoBox模板。因此，标记Infobox模板的自动过程可能对维基百科贡献者有用和有益。在本文中，我们介绍了基于自动语言处理（NLP）的自动化方法，以以无监督的方式识别InfoBox模板。通过使用语义关系（下个词典和纯名）和Wikipedia文章的Word功能开发了该方法。我们的方法从事三个步骤：首先，它处理文章的原始文本生成套的话，未来其应用的算法来识别的信息框式终于从模板列表的大型游泳池指出框模板。通过数据驱动的实验，在自主和准确性方面证明了拟议方法的有效性。

著录项

来源
《IEEE International Conference on Computational Science and Engineering》|2015年||共5页
会议地点
作者
Hanif Bhuiyan; Kyeong-Jin Oh; Myung-Duk Hong; Geun-Sik Jo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Wikipedia; Semantic relation; Identification; Infobox template;

机译：维基百科;语义关系;识别;InfoBox模板;

相似文献

外文文献
中文文献
专利

1. Developing an automated mechanism to identify medical articles from wikipedia for knowledge extraction [J] . Yu Lishan, Yu Sheng International journal of medical informatics . 2020,第Sepa期

机译：制定自动机制，以识别维基百科的医疗文章以获取知识提取
2. Identifying Controversial Wikipedia Articles Using Editor Collaboration Networks [J] . Sepehri-Rad Hoda, Barbosa Denilson ACM transactions on intelligent systems . 2015,第1期

机译：使用编辑者协作网络识别有争议的维基百科文章
3. THE THREE 'I'S APPROACH. ARTICLE1:IDENTIFY ARTICLE1:IDENTIFY [J] . VICTORIA STONE Journal of Cost Management . 2020,第2期

机译：第三，我的方法。第1条：标识第1条：标识
4. An unsupervised approach for identifying the Infobox template of wikipedia article [C] . Hanif Bhuiyan, Kyeong-Jin Oh, Myung-Duk Hong, IEEE International Conference on Computational Science and Engineering . 2015

机译：一种识别维基百科文章信息框模板的无人监督方法
5. A theoretical approach to legitimizing collaboratively constructed knowledge: A content analysis of Wikipedia science articles based on accidental collaboration. [D] . Hutchinson, James Patrick. 2011

机译：使协作构造的知识合法化的理论方法：基于偶然协作的Wikipedia科学文章的内容分析。
6. Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets [O] . Dario Borrelli, Gabriela Gongora Svartzman, Carlo Lipizzi 2020

机译：无监督的象征自然语言惯用单位的收购：新闻文章和推文的分组的基于n克频率的方法
7. Semantic Data Extraction from Infobox Wikipedia Template [O] . Amira Abd El-atey, Faculty Of Computers 2013

机译：从Infobox Wikipedia模板中提取语义数据

An unsupervised approach for identifying the Infobox template of wikipedia article

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅