首页> 外文会议>Proceedings of the Third IASTED International Conference on Advances in Computer Science and Technology >CALCULATION OF DOCUMENT SIMILARITY USING CELLULAR STRUCTURED SPACE TEMPLATE
【24h】

CALCULATION OF DOCUMENT SIMILARITY USING CELLULAR STRUCTURED SPACE TEMPLATE

机译:使用细胞结构空间模板计算文档相似度

获取原文
获取原文并翻译 | 示例

摘要

Calculation of similarity between corresponding documents becomes a major task in information retrieval from a textual database (e.g., electronic books or electronic dictionaries). The comparison between documents can be conducted by constructing associative feature vectors or set of terms and computing distance between the corresponding vectors or sets. While Boolean distance seems not practical and set similarity cannot handle with the case that some terms are more effective in retrieval than others, statistics of terms in documents is recognized as a good for computing document relevance.However, the efficiency of the calculation is based on only the size of the statistical data while the documents discourse or additional meaning from the structure of text is not considered. In this research, cellular structured space templates are used for building input documents.The concept of the cellular structured space template for specifying the basic layout and semantics of the document is a reasonable compromised between time-consumingmanual document retyping process and unavailable totally automated document recognition process. Semanticsbased similarity between documents is computed attached calculation of cellular structured vectors which are ndimensional context vectors of the documents.The between relevance documents compared with the normal retrieval methods.
机译:从文本数据库(例如,电子书或电子词典)检索信息时,计算相应文档之间的相似度成为一项主要任务。可以通过构造关联特征向量或术语集并计算相应向量或集之间的距离来进行文档之间的比较。虽然布尔距离似乎不切实际,并且在某些术语检索比其他术语更有效的情况下无法解决集合相似性问题,但文档中术语的统计数据被认为可以很好地计算文档相关性。但是,计算效率基于仅考虑统计数据的大小,而不会考虑文档论述或文本结构带来的其他含义。在这项研究中,使用蜂窝结构化空间模板来构建输入文档。用于指定文档的基本布局和语义的蜂窝结构化空间模板的概念在费时的手动文档重新键入过程和无法使用的全自动文档识别之间进行了合理的折衷。处理。文档之间基于语义的相似性是通过对作为文档的n维上下文向量的元胞结构化向量进行附加计算来计算的。相关文档之间的关联度与常规检索方法相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号