首页> 外国专利> Topic identification and use thereof in information retrieval systems

Topic identification and use thereof in information retrieval systems

机译:主题标识及其在信息检索系统中的使用

摘要

A technique to determine topics associated with, or classifications for, a data corpus uses an initial domain-specific word list to identify word combinations (one or more words) that appear in the data corpus significantly more often than expected. Word combinations so identified are selected as topics and associated with a user-specified level of granularity. For example, topics may be associated with each table entry, each image, each sentence, each paragraph, or an entire file. Topics may be used to guide information retrieval and/or the display of topic classifications during user query operations.
机译:确定与数据语料库关联或分类的主题的技术使用特定于域的初始单词列表来标识出现在数据语料库中的单词组合(一个或多个单词)的出现频率比预期的要高得多。这样确定的单词组合被选为主题,并与用户指定的粒度级别相关联。例如,主题可以与每个表条目,每个图像,每个句子,每个段落或整个文件相关联。在用户查询操作期间,主题可用于指导信息检索和/或主题分类的显示。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号