Concept vector extraction from Wikipedia category network

机译：从Wikipedia类别网络中提取概念向量

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The availability of machine readable taxonomy has been demonstrated by various applications such as document classification and information retrieval. One of the main topics of automated taxonomy extraction research is Web mining based statistical NLP and a significant number of researches have been conducted. However, existing works on automatic dictionary building have accuracy problems due to the technical limitation of statistical NLP (Natural Language Processing) and noise data on the WWW. To solve these problems, in this work, we focus on mining Wikipedia, a large scale Web encyclopedia. Wikipedia has high-quality and huge-scale articles and a category system because many users in the world have edited and refined these articles and category system daily. Using Wikipedia, the decrease of accuracy deriving from NLP can be avoided. However, affiliation relations cannot be extracted by simply descending the category system automatically since the category system in Wikipedia is not in a treestructure but a network structure. We propose concept vectorization methods which are applicable to the category network structured in Wikipedia.

机译：机器可读分类法的可用性已由各种应用程序证明，例如文档分类和信息检索。自动分类学提取研究的主要主题之一是基于Web挖掘的统计NLP，并且已经进行了大量研究。但是，由于统计NLP（自然语言处理）和WWW上的噪声数据的技术局限性，现有的自动词典构建工作存在准确性问题。为了解决这些问题，在这项工作中，我们专注于挖掘Wikipedia，这是一个大规模的Web百科全书。 Wikipedia拥有高质量的大规模文章和分类系统，因为世界上许多用户每天都在编辑和完善这些文章和分类系统。使用维基百科，可以避免源自NLP的准确性下降。但是，由于维基百科中的类别系统不是树结构而是网络结构，因此无法通过简单地自动降低类别系统来提取关联关系。我们提出了概念向量化方法，该方法适用于Wikipedia中构造的类别网络。

著录项

来源
《3rd international conference on ubiquitous information management and communication 2009》|2009年|P.71 - 79|共9页
会议地点
作者
Masumi Shirakawa; Kotaro Nakayama; Takahiro Hara; Shojiro Nishio;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类通信;
关键词
Wikipedia; categorization; concept vector; web mining;

机译：维基百科;分类;概念向量;网络挖掘;

相似文献

外文文献
中文文献
专利

1. Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph [J] . Muhammad Jawad Hussain, Shahbaz Hassan Wasti, Guangjian Huang, Data in Brief . 2020,第2期

机译：使用Wikipedia类别图中的多重继承计算概念之间的语义相似性的实验数据
2. Incorporating Wikipedia concepts and categories as prior knowledge into topic models [J] . Xu Kang, Qi Guilin, Huang Junheng, Intelligent data analysis . 2017,第2期

机译：将Wikipedia概念和类别作为先验知识整合到主题模型中
3. Comparing network centrality measures as tools for identifying key concepts in complex networks: A case of wikipedia [J] . Matas Neven, Martinčić-Ipšić Sanda, Meštrović Ana Journal of digital information management . 2017,第4期

机译：比较网络中心性度量作为识别复杂网络中关键概念的工具：维基百科的案例
4. Concept Vector Extraction from Wikipedia Category Network [C] . Masumi Shirakawa, Kotaro Nakayama, Takahiro Hara, 3rd international conference on ubiquitous information management and communication 2009 . 2009

机译：从维基百科分类网络中提取概念向量
5. Prediction of patient cost categories using neural networks and Support Vector Machines. [D] . Alkhawaldeh, Raghad. 2015

机译：使用神经网络和支持向量机预测患者的费用类别。
6. Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph [O] . Muhammad Jawad Hussain, Shahbaz Hassan Wasti, Guangjian Huang, 2020

机译：用于在Wikipedia类别图中使用多个继承来计算概念之间的语义相似性的实验数据
7. Disentangling the Wikipedia Category Graph for Corpus Extraction [O] . Axel-cyrille Ngonga Ngomo, Frank Schumacher 2011

机译：解开Wikipedia类别图以进行语料库提取

Concept vector extraction from Wikipedia category network

摘要

著录项

相似文献

相关主题

期刊订阅