首页> 外国专利> Method of analysing a text corpus and information analysis system

Method of analysing a text corpus and information analysis system

机译:文本语料库的分析方法和信息分析系统

摘要

An information analysis system (2) for retrieving related character sequences is programmed to select combinations of a first character sequence and a second character sequence, occurring in a text corpus (5) and differing in composition by an amount smaller than a threshold distance value. The system comprises a database (7) comprising a list (8) of character sequences in the text corpus and counts (9) of the number of occurrences of each sequence in the corpus (5), and a database (10; 28) comprising a list of combinations of character sequences in the corpus and sequence distance values, providing a measure of difference in composition between character sequences in a combination. The system (2) is programmed to compute a value indicative of the difference in number of occurrences between the first character sequence and the second character sequence and return the combination as output if the occurrence difference value is larger than a threshold value.
机译:用于检索相关字符序列的信息分析系统(2)被编程为选择第一字符序列和第二字符序列的组合,它们出现在文本语料(5)中并且组成的差异小于阈值距离值。该系统包括数据库(7)和数据库(10; 28),数据库(7)包括文本语料库中的字符序列的列表(8)以及语料库(5)中每个序列的出现次数的计数(9)。语料库中的字符序列和序列距离值的组合列表,提供组合中字符序列之间组成差异的度量。系统(2)被编程为计算指示第一字符序列和第二字符序列之间的出现次数差的值,并且如果出现差值大于阈值则返回该组合作为输出。

著录项

  • 公开/公告号EP1288790A1

    专利类型

  • 公开/公告日2003-03-05

    原文格式PDF

  • 申请/专利权人 TARCHON BV;

    申请/专利号EP20010203239

  • 申请日2001-08-29

  • 分类号G06F17/27;G06F17/30;

  • 国家 EP

  • 入库时间 2022-08-21 23:50:28

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号