首页>
外国专利>
Method of analysing a text corpus and information analysis system
Method of analysing a text corpus and information analysis system
展开▼
机译:文本语料库的分析方法和信息分析系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
An information analysis system (2) for retrieving related character sequences is programmed to select combinations of a first character sequence and a second character sequence, occurring in a text corpus (5) and differing in composition by an amount smaller than a threshold distance value. The system comprises a database (7) comprising a list (8) of character sequences in the text corpus and counts (9) of the number of occurrences of each sequence in the corpus (5), and a database (10; 28) comprising a list of combinations of character sequences in the corpus and sequence distance values, providing a measure of difference in composition between character sequences in a combination. The system (2) is programmed to compute a value indicative of the difference in number of occurrences between the first character sequence and the second character sequence and return the combination as output if the occurrence difference value is larger than a threshold value.
展开▼