Building Semantic Kernels for Text Classification using Wikipedia

机译：使用Wikipedia构建用于文本分类的语义内核

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The traditional document representation is a word-based vector (Bag of Words, or BOW), where each dimension is associated with a term of the dictionary containing all the words that appear in the corpus. Although simple and commonly used, this representation has several limitations. It is essential to embed semantic information and conceptual patterns in order to enhance the prediction capabilities of classification algorithms. In this paper, we overcome the shortages of the BOW approach by embedding background knowledge derived from Wikipedia into a semantic kernel, which is then used to enrich the representation of documents. Our empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the BOW technique, and to other recently developed methods.

机译：由于文本数据的稀疏性和高维性以及自然语言的复杂语义，文档分类提出了艰巨的挑战。传统的文档表示形式是基于单词的向量（单词袋或BOW），其中每个维度都与词典中的术语相关联，词典中包含出现在语料库中的所有单词。尽管这种表示形式很简单且常用，但它有一些局限性。嵌入语义信息和概念模式对于增强分类算法的预测能力至关重要。在本文中，我们通过将Wikipedia衍生的背景知识嵌入语义内核中来克服BOW方法的不足，然后将其用于丰富文档的表示形式。我们使用真实数据集进行的经验评估表明，相对于BOW技术和其他最近开发的方法，我们的方法成功实现了改进的分类精度。

著录项

来源
《ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008》|2008年|695-703|共9页
会议地点
作者
Pu Wang; Carlotta Domeniconi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与知识传播;
关键词
text classification; wikipedia; kernel methods; semantic kernels;

机译：文字分类维基百科;内核方法;语义内核;

相似文献

外文文献
中文文献
专利

1. BUILDING SEMANTIC NETWORKS FROM PLAIN TEXT AND WIKIPEDIA WITH APPLICATION TO SEMANTIC RELATEDNESS AND NOUN COMPOUND PARAPHRASING [J] . PIA-RAMONA WOJTINNEK, STEPHEN PULMAN, JOHANNA VOLKER International journal of semantic computing . 2012,第1期

机译：从纯文本和维基百科构建语义网络及其在语义关联和名词复合参数化中的应用
2. Towards perfect text classification with Wikipedia-based semantic Naive Bayes learning [J] . Kim Han-joon, Kim Jiyun, Kim Jinseog, Neurocomputing . 2018,第NOVa13期

机译：通过基于维基百科的语义朴素贝叶斯学习实现完美的文本分类
3. Building semantic kernels for cross-document knowledge discovery using Wikipedia [J] . Yan Peng, Jin Wei Knowledge and information systems . 2017,第1期

机译：使用维基百科建立语义内核进行跨文档知识发现
4. Building semantic kernels for text classification using wikipedia [C] . Pu Wang, Carlotta Domeniconi ACM SIGKDD international conference on Knowledge discovery and data mining . 2008

机译：使用Wikipedia构建用于文本分类的语义内核
5. Kernel methods and semantic techniques for clinical text classification [D] . Garla, Vijay. 2012

机译：临床文本分类的内核方法和语义技术
6. Building a biomedical semantic network in Wikipedia with Semantic Wiki Links [O] . Benjamin M. Good, Erik L. Clarke, Salvatore Loguercio, 2012

机译：使用语义Wiki链接在Wikipedia中构建生物医学语义网络
7. FORMATION OF A SEMANTIC KERNEL IN VETERINARY MEDICINE WITH THE AUTOMATED SYSTEM-COGNITIVE ANALYSIS OF PASSPORTS OF SCIENTIFIC SPECIALTIES OF THE HIGHER ATTESTATION COMMISSION OF THE RUSSIAN FEDERATION AND THE AUTOMATIC CLASSIFICATION OF TEXTS ACCORDING TO THE AREAS OF SCIENCE [O] . Y. V. Lutsenko 2018

机译：在兽医中形成语义内核，通过自动化系统认知分析俄罗斯联邦的高度认证委员会的科学专业护照和根据科学领域自动分类文本

Building Semantic Kernels for Text Classification using Wikipedia

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅