首页> 外文会议>IEEE International Conference on Networking, Sensing and Control >Large-scale documents reduction based on domain ontology and E2LSH
【24h】

Large-scale documents reduction based on domain ontology and E2LSH

机译:基于领域本体和E2LSH的大规模文档约简

获取原文
获取外文期刊封面目录资料

摘要

Large-scale documents reduction plays a critical role in document management organizing and document mining, etc, and the research is concentrated on two special aspects: the construction of document representation model and index optimization of feature space for similarity search. While the semantic gap and curse of dimensionality are still two open and tough issues. Motivated by this, in the paper, we propose a novel method based on domain ontology and E2LSH (Exact Euclidean Locality-Sensitive Hashing). Firstly, we build an improved model based on domain ontology, called Semantic Vector Space Model (SVSM), to reveal the latent semantic relationships among document feature terms besides syntax information. The SVSM shortens the semantic gap of traditional VSM and reduces feature dimension. Then in view of the complexity of searching space for the similarity computation among documents pairs, we introduce E2LSH to build indexes of feature space, optimizing the searching space and overcoming the curse of dimensionality. Experimental validation has been conducted using realistic documents, and experimental results indicate the rationality and effectiveness of our method
机译:大规模文档缩减在文档管理组织和文档挖掘等方面起着至关重要的作用,研究集中在两个方面:文档表示模型的构建和相似度搜索的特征空间索引优化。虽然语义上的鸿沟和维数的诅咒仍然是两个悬而未决的难题。为此,本文提出了一种基于领域本体和E2LSH(精确欧氏局部敏感哈希)的新方法。首先,我们建立了一个基于领域本体的改进模型,称为语义向量空间模型(SVSM),以揭示除语法信息之外的文档特征项之间的潜在语义关系。 SVSM缩短了传统VSM的语义鸿沟,并缩小了特征尺寸。然后,针对文档对之间相似度计算的搜索空间的复杂性,我们引入E2LSH来建立特征空间的索引,优化搜索空间并克服维数的诅咒。实验验证已使用实际文件进行,实验结果表明了我们方法的合理性和有效性

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号