An Extension of the VSM Documents Representation

Lucian Nicolae Vintan; Daniel Ionel Morariu; Radu George Cretulescu; Maria Vintan

首页> 外文期刊>International journal of computers, communications and control >An Extension of the VSM Documents Representation

【24h】

An Extension of the VSM Documents Representation

机译：VSM文档表示的扩展

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we will present a new approach regarding the documents representation in order to be used in classification and/or clustering algorithms. In our new representation we will start from the classical "bag-of-words" representation but we will augment each word with its correspondent part-of-speech. Thus we will introduce a new concept called hyper-vectors where each document is represented in a hyper-space where each dimension is a different part-of-speech component. For each dimension the document is represented using the Vector Space Model (VSM). In this work we will use only five different parts of speech: noun, verb, adverb, adjective and others. In the hyper-space each dimension has a different weight. To compute the similarity between two documents we have developed a new hyper-cosine formula. Some interesting classification experiments are presented as validation cases.

机译：在本文中，我们将提出一种有关文档表示的新方法，以便在分类和/或聚类算法中使用。在我们的新表示中，我们将从经典的“词袋”表示开始，但是我们将使用其对应的词性来扩充每个词。因此，我们将引入一个称为超向量的新概念，其中每个文档都在一个超空间中表示，其中每个维度是不同的词性成分。对于每个维度，使用向量空间模型（VSM）表示文档。在这项工作中，我们将仅使用五个不同的语音部分：名词，动词，副词，形容词和其他。在超空间中，每个维度的权重都不同。为了计算两个文档之间的相似性，我们开发了一个新的高余弦公式。提出了一些有趣的分类实验作为验证案例。

著录项

来源
《International journal of computers, communications and control》 |2017年第3期|共13页
作者
Lucian Nicolae Vintan; Daniel Ionel Morariu; Radu George Cretulescu; Maria Vintan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词
documents representationvector space modelhyper-vectorsdocuments similarityclassificationclustering;

机译：文档表示向量空间模型超向量文档相似性分类聚类;

相似文献

外文文献
中文文献
专利

1. An Extension of the VSM Documents Representation [J] . Lucian Vintan, Daniel Morariu, Radu Cretulescu, International journal of computers, communications & control . 2017,第3期

机译：VSM文档表示的扩展
2. An Extension of the VSM Documents Representation [J] . Lucian Nicolae Vintan, Daniel Ionel Morariu, Radu George Cretulescu, IAENG Internaitonal journal of computer science . 2017,第3期

机译：VSM文档表示的扩展
3. Representations of the necklace braid group N B n documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$${{mathcal {N}}{mathcal {B}}}_n$$end{document} of dimension 4 ( n = 2 , 3 , 4 documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$n=2,3,4$$end{document} ) [J] . Taher I. Mayassi, Mohammad N. Abdulrahim Arabian Journal of Mathematics . 2021,第2期

机译：项链编织组的表示<直列式ID = “IEq1”> <替代> 名词乙名词 <特-math ID = “IEq1_TeX”> 的DocumentClass [12磅] {最小} {usepackage amsmath} {usepackage wasysym} {usepackage amsfonts} {usepackage amssymb} {usepackage amsbsy} {usepackage mathrsfs} {usepackage upgreek } setlength { oddsidemargin} { - 69pt} {开始文档} $$ {{ mathcal {N}} { mathcal {B}}} _ñ$$ {端文档} <直列 - 图形的xlink：HREF = “40065_2021_325_Article_IEq1.gif”/> （<直列式ID = “IEq2”> <替代> 名词 = 2 ， 3 ， 4 的DocumentClass [12磅] {最小} {usepackage amsmath} {usepackage wasysym} usepackage {amsfonts} {usepackage amssymb} {usepackage amsbsy} {usepackage mathrsfs} {usepackage upgreek} setlength { oddsidemargin} { - 69pt} {开始文档} $$ N = 2,3,4 $$ {端文档} <直列图形的xlink：HREF = “40065_2021_325_Article_IEq2.gif”/> ）
4. E-VSM: Novel text representation model to capture contex-based closeness between two text documents [C] . Bhakkad Ankit, Dharmadhikari S.C., Emmanuel M., International Conference on Intelligent Systems and Control . 2013

机译：E-VSM：新颖的文本表示模型，可捕获两个文本文档之间基于contex的紧密度
5. A comparative study on ontology generation and text clustering using VSM, LSI, and document ontology models. [D] . Taylor, William P., II. 2007

机译：使用VSM，LSI和文档本体模型进行本体生成和文本聚类的比较研究。
6. Explanation and Elaboration Document for the STROBE‐Vet Statement: Strengthening the Reporting of Observational Studies in Epidemiology—Veterinary Extension [O] . A.M. OConnor, J.M. Sargeant, I.R. Dohoo, 2016

机译：STROBE-Vet声明的解释和详细说明文件：加强流行病学的观察性研究报告-兽医推广
7. An Extension of the VSM Documents Representation [O] . Lucian Nicolae Vintan, Daniel Ionel Morariu, Radu George Cretulescu, 2017

机译：VSM文档表示的扩展

An Extension of the VSM Documents Representation

摘要

著录项

相似文献

相关主题

期刊订阅