首页> 外文会议>IEEE International Conference on Computational Science and Engineering >An unsupervised approach for identifying the Infobox template of wikipedia article
【24h】

An unsupervised approach for identifying the Infobox template of wikipedia article

机译:一种识别维基百科文章信息框模板的无人监督方法

获取原文
获取外文期刊封面目录资料

摘要

Wikipedia infoboxes serve as important structured information source in the web. To author infobox for a particular article, volunteers required a considerable amount of manual effort to identify the respective infobox template. Thus, an automatic process to mark infobox template might be useful and beneficial for the Wikipedia contributors. In this paper, we present a Natural Language Processing (NLP)-based automated approach to identify the infobox template in an unsupervised fashion. The proposed approach has been developed by using semantic relations (hyponym and holonym) and word features of Wikipedia articles. Our approach works in three steps: first it processes the raw text of the article to generate sets of words, next it apply the proposed algorithm to identify the infobox type and finally point out the infobox template from the large pool of template list. The effectiveness of the proposed approach has been proved in terms of autonomous and accuracy, by a data-driven experiment.
机译:维基百科信息框在Web中作为重要的结构化信息源。要向某些特定文章作者,志愿者需要相当多的手动努力来识别各自的InfoBox模板。因此,标记Infobox模板的自动过程可能对维基百科贡献者有用和有益。在本文中,我们介绍了基于自动语言处理(NLP)的自动化方法,以以无监督的方式识别InfoBox模板。通过使用语义关系(下个词典和纯名)和Wikipedia文章的Word功能开发了该方法。我们的方法从事三个步骤:首先,它处理文章的原始文本生成套的话,未来其应用的算法来识别的信息框式终于从模板列表的大型游泳池指出框模板。通过数据驱动的实验,在自主和准确性方面证明了拟议方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号