...
首页> 外文期刊>Knowledge and information systems >A novel page clipping search engine based on page discussion topics
【24h】

A novel page clipping search engine based on page discussion topics

机译:基于页面讨论主题的新型页面剪辑搜索引擎

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this paper, we propose a page clipping search engine based on page discussion topics. Compared to other search engines, our search engine uses the page discussion topic instead of the search engine results page as the main result. After the user selects the topic of interest, our search engine will clip the relevant pages according to the selected topic and produce an integrated page result. The advantage of this topic-based integration page result is that the user can reduce the time it takes to decide whether the page content is relevant. Our results consist of two parts: the query-related discussion topics and the clipping results for relevant pages. We first use an adjusted N-gram language model and a hash method to produce discussion topics. At the same time, we use the idea of binary coding and mathematical set to organize related topics into a hierarchical topic tree with parent-child relationship. Next, we use a cost-effective genetic algorithm to produce the relevant page clipping results. This study has the following three advantages. The first is that we can find multiple clustering relationships, that is, a child topic can appear simultaneously in multiple parent topics. The second is that we propose a good topic generation method, that is, we cannot only produce better quality topics, but also produce the topic tree in a linear time. The third is that we propose a good clipping generation method, that is, we cannot only produce better quality clippings, but also produce a cost-effective solution.
机译:在本文中,我们提出了一种基于页面讨论主题的页面剪辑搜索引擎。与其他搜索引擎相比,我们的搜索引擎使用页面讨论主题而不是搜索引擎结果页面作为主要结果。在用户选择感兴趣的主题之后,我们的搜索引擎将根据所选主题剪辑相关页面并生成集成页面结果。基于主题的集成页面结果的优势在于用户可以减少决定页面内容是否相关的时间。我们的结果包括两个部分:相关页面的查询相关讨论主题和剪辑结果。我们首先使用调整后的n-gram语言模型和哈希方法来产生讨论主题。与此同时,我们使用二进制编码和数学集的想法,将相关主题组织到具有父子关系的分层主题树中。接下来,我们使用经济高效的遗传算法来产生相关的页面剪辑结果。本研究具有以下三个优点。首先是我们可以找到多个聚类关系,即,子主题可以在多个父主题中同时出现。第二,我们提出了一个好主题的生成方法,即,我们不仅会产生更好的质量主题,还可以在线性时间制作主题树。第三是我们提出了一种良好的剪裁生成方法,即我们不能只生产更好的质量剪报,还可以产生经济效益的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号