首页> 外国专利> Automatically generating a topic description for text and searching and sorting text by topic using the same

Automatically generating a topic description for text and searching and sorting text by topic using the same

机译:自动生成文本的主题描述,并使用该主题按主题搜索和排序文本

摘要

A method of automatically generating a topical description of text by receiving the text containing input words; stemming each input word to its root form; assigning a user-definable part-of-speech score to each input word; assigning a language salience score to each input word; assigning an input-word score to each input word; creating a tree structure under each input word, where each tree structure contains the definition of the corresponding input word; assigning a definition-word score to each definition word; collapsing each tree structure to a corresponding tree-word list; assigning a tree-word-list score to each entry in each tree-word list; combining the tree-word lists into a final word list; assigning each word in the final word list a final-word-list score; and choosing the top N scoring words in the final word list as the topic description of the input text. Document searching and sorting may be accomplished by performing the method described above on each document in a database and then comparing the similarity of the resulting topical descriptions.
机译:一种通过接收包含输入单词的文本自动生成文本主题描述的方法;将每个输入词词根化为词根形式;为每个输入单词分配用户可定义的词性分数;为每个输入单词分配语言显着性分数;给每个输入词分配一个输入词分数;在每个输入词下创建一个树结构,其中每个树结构都包含相应输入词的定义;给每个定义词分配一个定义词分数;将每个树结构折叠到相应的树词列表;为每个树词列表中的每个条目分配一个树词列表分数;将树词列表合并为最终词列表;为最终单词列表中的每个单词分配最终单词列表得分;然后在最终单词列表中选择得分最高的N个单词作为输入文本的主题描述。可以通过在数据库中的每个文档上执行上述方法,然后比较所得主题描述的相似性,来完成文档搜索和排序。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号