【24h】

Domain Relevance on Term Weighting

机译:术语加权的域相关性

获取原文

摘要

The TFxIDF term weighting scheme is the standard approach on vectorization of textual data. For a data set where textual data stemming from web document structure is to be vectorized [2] the need for a enhanced term weighting scheme arose. In this publication we introduce a term weighting scheme which improves the behavior compared to the traditional TFxIDF scheme by adding a component which is based on the linguistically inspired notion of domain relevance. Domain relevance measures the degree to which a term is regarded as more relevant within a data set compared to a reference data set. By means of this external component a potential weakness of TFxIDF on non standard distributed data sets is overcome. This weighting scheme favours domain relevant terms, which can be regarded as more useful in settings where the clustering is performed to be consumed by an human supervisor e.g. for semi-automatic ontology learning.
机译:TFXIDF术语加权方案是文本数据的矢量化标准方法。对于从Web文档结构的文本串行的文本数据的数据集被vied数据集[2],需要增强术语加权方案的需要。在本出版物中,我们介绍了一个术语加权方案,该方案通过添加基于域相关性的语言上灵感的概念来改善与传统TFXIDF方案相比的行为。域相关性测量与参考数据集相比,术语在数据集中被视为更相关的程度。通过这种外部分量,克服了TFXIDF上的TFXIDF上的潜在弱点。克服了非标准分布式数据集的潜在弱点。此加权方案有利于域相关术语,可以被视为在执行群集的设置中更有用的是人类监督员的例如:用于半自动本体学习。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号