Ontology-based similarity for clustering in text space.

机译：基于本体的文本空间聚类相似度。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the advent of the World Wide Web, and the increasing popularity of web search engines, there has been a renewed interest in information retrieval systems. In this research, we introduce a system that combines category-based and keyword-based concepts for a better information retrieval system. For improved document clustering, we proposed a document similarity measure that is based on keyword frequency in documents, but also uses an input ontology. This ontology is domain specific and includes a list of keywords organized with their degree of importance to the categories of the ontology. We evaluated the performance of this similarity measure and compared it to the standard cosine vector similarity measure. For that, we used document data with pre-determined structure as well as actual web documents. We designed a framework to generate synthetic data to model documents, and analyzed statistical attributes of documents in high dimension. For synthetic data analysis, we designed a controllable structure using various distributions of angle to specify cluster compactness and angle based inter-cluster overlap to specify cluster isolation. We address the issue of modeling text documents, and propose the use of a graph data model that is based on the concept of semantic groups. We present a mechanism by which semantic groups can be used with document processing.

机译：随着万维网的出现以及网络搜索引擎的日益普及，人们对信息检索系统有了新的兴趣。在这项研究中，我们介绍了一个结合了基于类别和基于关键字的概念的系统，以提供更好的信息检索系统。为了改进文档聚类，我们提出了一种基于文档中关键词频率的文档相似性度量，但也使用了输入本体。该本体是特定于领域的，并且包括按其对本体类别的重要性的程度组织的关键字列表。我们评估了这种相似性度量的性能，并将其与标准余弦矢量相似性度量进行了比较。为此，我们使用了具有预定结构的文档数据以及实际的Web文档。我们设计了一个框架来生成用于对文档进行建模的综合数据，并分析了高维文档的统计属性。对于综合数据分析，我们设计了一种可控的结构，该结构使用各种角度分布来指定群集紧凑度，并使用基于角度的群集间重叠来指定群集隔离。我们解决了建模文本文档的问题，并提出了基于语义组概念的图形数据模型的使用。我们提出了一种语义组可用于文档处理的机制。

著录项

作者
Assem, Nasser.;
展开▼
作者单位

Michigan State University.;

展开▼
授予单位 Michigan State University.;
学科 Computer Science.
学位 Ph.D.
年度 2002
页码 115 p.
总页数 115
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-17 11:46:26

相似文献

外文文献
中文文献
专利

1. Ontology-based similarity measure for text clustering [J] . Yan Duanwu, Li Xiaopeng, Wang Lei Journal of Southeast University . 2006,第3期

机译：基于本体的文本聚类相似性度量
2. Comparison of Ontology-Based Semantic-Similarity Measures in the Biomedical Text [J] . Ahmad Fayez S. Althobaiti Journal of Computer and Communications . 2017,第2期

机译：生物医学文本中基于本体的语义相似性度量的比较
3. An Ontology-Based Text-Mining Method to Cluster Proposals for Research Project Selection [J] . Jian Ma Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on . 2012,第3期

机译：基于本体的文本挖掘方法对研究项目选择的提案进行聚类
4. Self-adaptive GA,Quantitative Semantic Similarity Measures and Ontology-based Text Clustering [C] . Chengzhi ZHANG, Wei SONG, Chenghua LI, The 2008 IEEE International Conference on Natural Language Processing and Knowledge Engineering（IEEE NLP-KE 2008）(2008IEEE自然语言处理与知识工程国际会议)论文集 . 2008

机译：自适应遗传算法，定量语义相似性度量和基于本体的文本聚类
5. An Automatic Similarity Detection Engine Between Sacred Texts Using Text Mining and Similarity Measures [D] . Qahl, Salha Hassan Muhammed. 2014

机译：使用文本挖掘和相似度度量的神圣文本之间的自动相似度检测引擎
6. From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity [O] . Mingxin Gan, Xue Dou, Rui Jiang 2013

机译：从本体论到语义相似度：基于本体论的语义相似度的计算
7. Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering [O] . Zhang Chengzhi, Song Wei, Li Chenghua, 2008

机译：自适应遗传算法，定量语义相似性度量和基于本体的文本聚类

Ontology-based similarity for clustering in text space.

摘要

著录项

相似文献

相关主题

期刊订阅