首页> 外文期刊>Data & Knowledge Engineering >A fuzzy document clustering approach based on domain-specified ontology
【24h】

A fuzzy document clustering approach based on domain-specified ontology

机译:基于领域本体的模糊文档聚类方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Document clustering techniques include automatic document organization, topic extraction, fast information retrieval or filtering, etc. Numerous methods have been developed for document clustering research. Despite the advances achieved, however, document clustering still presents certain challenges such as optimizing feature selection for low-dimensional docurnent representation and incorporating mutual information between the documents into a clustering algorithm. This paper mainly focuses on these two questions. First, we construct a domain-specific ontology that provides the controlled vocabulary describing the hazards related to dairy products. Synonyms of the controlled vocabulary in document set are considered to be relatively prevalent and fundamentally important for feature selection. Second, in combination with the vector space model (VSM), we perform singular value decomposition (SVD) to translate all of the term-document vectors into a concept space. We then obtain the mutual information between documents by calculating the similarity of every two document vectors in the orthogonal matrix of right singular vectors. As the mutual information matrix is also a fuzzy compatible relation, a fuzzy equivalence can be derived by calculating max-min transitive closure. Finally, based on the fuzzy equivalence relation, all of the data sequences are easily allocated into clusters under the guidance of a cluster validation index. Our method both reduces the dimensionality of the original data and considers the correlation between the terms. The experimental results show that encoding the ontologies in the aggregation process could provide better clustering results. Moreover, the proposed work has been applied to food safety supervision which is beneficial for government and society. (C) 2015 Elsevier B.V. All rights reserved.
机译:文档聚类技术包括自动文档组织,主题提取,快速信息检索或过滤等。已经开发了许多用于文档聚类研究的方法。尽管取得了进步,但是文档聚类仍然面临某些挑战,例如针对低维文档表示优化特征选择以及将文档之间的互信息纳入聚类算法。本文主要针对这两个问题。首先,我们构建一个特定领域的本体,该本体提供描述词汇与乳制品相关的危害的受控词汇表。文档集中受控词汇的同义词被认为相对流行,并且对于特征选择至关重要。其次,结合向量空间模型(VSM),我们执行奇异值分解(SVD)将所有术语文档向量转换为概念空间。然后,我们通过计算右奇异矢量正交矩阵中每两个文档矢量的相似度来获得文档之间的相互信息。由于互信息矩阵也是模糊兼容关系,因此可以通过计算最大-最小传递闭包来得出模糊等价关系。最后,基于模糊等价关系,所有数据序列都可以在聚类验证指标的指导下轻松地分配到聚类中。我们的方法既降低了原始数据的维数,又考虑了术语之间的相关性。实验结果表明,在聚合过程中对本体进行编码可以提供更好的聚类结果。此外,该建议的工作已应用于食品安全监管,这对政府和社会都是有益的。 (C)2015 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Data & Knowledge Engineering》 |2015年第novaptaa期|148-166|共19页
  • 作者单位

    Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China|Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China|Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia;

    Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China|Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China;

    Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China|Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China;

    Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China|Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China;

    Changchun Univ Technol, Sch Comp Sci & Engn, Changchun 130012, Jilin, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Domain-specified ontology; Document clustering; Feature selection; Singular value decomposition (SVD); Fuzzy equivalence relation;

    机译:领域本体;文档聚类;特征选择;奇异值分解(SVD);模糊对等关系;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号