首页> 外文期刊>Expert systems with applications >A feature-free search query classification approach using semantic distance
【24h】

A feature-free search query classification approach using semantic distance

机译:利用语义距离的无特征搜索查询分类方法

获取原文
获取原文并翻译 | 示例
       

摘要

When classifying search queries into a set of target categories, machine learning based conventional approaches usually make use of external sources of information to obtain additional features for search queries and training data for target categories. Unfortunately, these approaches rely on large amount of training data for high classification precision. Moreover, they are known to suffer from inability to adapt to different target categories which may be caused by the dynamic changes observed in both Web topic taxonomy and Web content. In this paper, we propose a feature-free classification approach using semantic distance. We analyze queries and categories themselves and utilizes the number of Web pages containing both a query and a category as a semantic distance to determine their similarity. The most attractive feature of our approach is that it only utilizes the Web page counts estimated by a search engine to provide the search query classification with respectable accuracy. In addition, it can be easily adaptive to the changes in the target categories, since machine learning based approaches require extensive updating process, e.g., re-labeling outdated training data, re-training classifiers, to name a few, which is time consuming and high-cost. We conduct experimental study on the effectiveness of our approach using a set of rank measures and show that our approach performs competitively to some popular state-of-the-art solutions which, however, frequently use external sources and are inherently insufficient in flexibility.
机译:当将搜索查询分类为一组目标类别时,基于机器学习的常规方法通常利用外部信息源来获取搜索查询的附加功能和目标类别的训练数据。不幸的是,这些方法依靠大量的训练数据来实现高分类精度。此外,众所周知,它们无法适应不同的目标类别,这可能是由Web主题分类法和Web内容中的动态变化引起的。在本文中,我们提出了一种使用语义距离的无特征分类方法。我们分析查询和类别本身,并利用包含查询和类别的网页数作为语义距离来确定它们的相似性。我们的方法最吸引人的特点是,它仅利用搜索引擎估计的网页计数来提供准确度可观的搜索查询分类。另外,由于基于机器学习的方法需要大量的更新过程,例如,重新标记过时的训练数据,重新训练分类器(仅举几例),这很耗时,并且很容易适应目标类别的变化。高成本。我们使用一套等级方法对方法的有效性进行了实验研究,结果表明我们的方法与某些流行的最新解决方案相比具有竞争优势,但是这些解决方案经常使用外部资源,并且固有地灵活性不足。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号