Large-scale documents reduction based on domain ontology and E2LSH

机译：基于领域本体和E2LSH的大规模文档约简

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Large-scale documents reduction plays a critical role in document management organizing and document mining, etc, and the research is concentrated on two special aspects: the construction of document representation model and index optimization of feature space for similarity search. While the semantic gap and curse of dimensionality are still two open and tough issues. Motivated by this, in the paper, we propose a novel method based on domain ontology and E2LSH (Exact Euclidean Locality-Sensitive Hashing). Firstly, we build an improved model based on domain ontology, called Semantic Vector Space Model (SVSM), to reveal the latent semantic relationships among document feature terms besides syntax information. The SVSM shortens the semantic gap of traditional VSM and reduces feature dimension. Then in view of the complexity of searching space for the similarity computation among documents pairs, we introduce E2LSH to build indexes of feature space, optimizing the searching space and overcoming the curse of dimensionality. Experimental validation has been conducted using realistic documents, and experimental results indicate the rationality and effectiveness of our method

机译：大规模文档缩减在文档管理组织和文档挖掘等方面起着至关重要的作用，研究集中在两个方面：文档表示模型的构建和相似度搜索的特征空间索引优化。虽然语义上的鸿沟和维数的诅咒仍然是两个悬而未决的难题。为此，本文提出了一种基于领域本体和E2LSH（精确欧氏局部敏感哈希）的新方法。首先，我们建立了一个基于领域本体的改进模型，称为语义向量空间模型（SVSM），以揭示除语法信息之外的文档特征项之间的潜在语义关系。 SVSM缩短了传统VSM的语义鸿沟，并缩小了特征尺寸。然后，针对文档对之间相似度计算的搜索空间的复杂性，我们引入E2LSH来建立特征空间的索引，优化搜索空间并克服维数的诅咒。实验验证已使用实际文件进行，实验结果表明了我们方法的合理性和有效性

著录项

来源
《IEEE International Conference on Networking, Sensing and Control》|2014年|24-29|共6页
会议地点
作者
Li Hongmei; Hao Wenning; Chen Gang; Liao Xianglin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
E2LSH; VSM; documents reduction; domain ontology; semantic relationship; similarity computation;

机译：E2LSH; VSM;文档约简;领域本体;语义关系;相似度计算;

相似文献

外文文献
中文文献
专利

1. A Framework for Semi Semantic Ontology Based Document Clustering in Geospatial Domain [J] . Jian Shen, Dapeng Man, Wu Yang International journal of computational intelligence research . 2018,第9期

机译：地理空间域中基于半语义本体的文档聚类框架
2. Semantic Indexing of Web Documents Based on Domain Ontology [J] . Abdeslem DENNAI, Sidi Mohammed BENSLIMANE International Journal of Information Technology and Computer Science . 2015,第2期

机译：基于领域本体的Web文档语义索引
3. A fuzzy document clustering approach based on domain-specified ontology [J] . Yue Lin, Zuo Wanli, Peng Tao, Data & Knowledge Engineering . 2015,第NOVaPTaA期

机译：基于领域本体的模糊文档聚类方法
4. Large-scale documents reduction based on domain ontology and E2LSH [C] . Li Hongmei, Hao Wenning, Chen Gang, IEEE International Conference on Networking, Sensing and Control . 2014

机译：基于域本体和E2LSH的大规模文档减少
5. Time-domain finite-element reduction-recovery methods for large-scale electromagnetics-based analysis and design of next-generation integrated circuits [D] . Gan, Houle 2010

机译：基于时域有限元缩减-恢复方法的大规模基于电磁学的下一代集成电路分析与设计
6. iSMART: Ontology-based Semantic Query of CDA Documents [O] . Shengping Liu, Yuan Ni, Jing Mei, 2009

机译：iSMART：CDA文档的基于本体的语义查询
7. Knowledge Acquisition of Domain Ontology Based on the Documents [O] . Zhaoqin Hu A 2016

机译：基于文档的领域本体知识获取

Large-scale documents reduction based on domain ontology and E2LSH

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅