首页> 外国专利> METHOD OF COMPUTERIZED SEMANTIC INDEXING OF NATURAL LANGUAGE TEXT, METHOD OF COMPUTERIZED SEMANTIC INDEXING OF COLLECTION OF NATURAL LANGUAGE TEXTS, AND MACHINE-READABLE MEDIA

METHOD OF COMPUTERIZED SEMANTIC INDEXING OF NATURAL LANGUAGE TEXT, METHOD OF COMPUTERIZED SEMANTIC INDEXING OF COLLECTION OF NATURAL LANGUAGE TEXTS, AND MACHINE-READABLE MEDIA

机译:自然语言文本的计算机化语义索引方法,自然语言文本的计算机化语义索引方法以及机器可读媒体

摘要

The present invention relates to the information technologies field, namely, to methods of computerized semantic indexing of natural language texts. The use of the present invention permits for extending the set of methods for indexing the natural languages texts by means of employing techniques of the computerized linguistic analysis thereof and further usage of obtained results for building indices, which ensures the semantic navigation through documents and document collections, as well as the highly-precise and quick search of facts and documents relevant to the user's information needs, particularly, in reference to the high-inflectional language texts. The method of computerized semantic indexing of natural language text comprises steps of: segmenting the text in the electronic form into tokens; identifying stable phrases; forming sentences; by addressing the linguistic and heuristic rules formed in the database in the predetermined linguistic environment, identifying the semantically meaningful objects (named entities) and the semantically meaningful relations therebetween (named relations); for every named relations, forming the set of triples, where single first type triple corresponding to the relation established by the named relation between two named entities, each of the set of the second type triples corresponding to a value of particular attribute of one of those entities, and each of the set of the third type triples corresponding to a value of particular attribute of the named relation itself; at the set of the formed triples, indexing all named entities related by the named relations separately, all pairs of the kind "named entity - named relation", and all triples of the kind "named entity - named relation - named entity", while taking into account the attributes of respective named entities and/or named relations; and storing in the database the formed triples and the obtained indices together with the reference to the initial text from which those triples have been formed.
机译:本发明涉及信息技术领域,即自然语言文本的计算机化语义索引方法。本发明的使用允许通过利用其计算机化语言分析技术以及进一步使用所获得的结果来建立索引来扩展用于索引自然语言文本的方法集,这确保了通过文档和文档集合的语义导航。 ,以及与用户信息需求相关的事实和文档的高精度和快速搜索,尤其是关于高屈折性语言文本的参考。自然语言文本的计算机化语义索引方法包括以下步骤:将电子形式的文本分段为标记;确定稳定的短语;形成句子;通过解决在预定语言环境中在数据库中形成的语言和启发式规则,识别语义上有意义的对象(命名实体)及其之间的语义上有意义的关系(命名关系);对于每个命名关系,形成三元组集合,其中单个第一类型三元组对应于由两个命名实体之间的命名关系建立的关系,第二类型三元组的每个集合都对应于其中一个的特定属性的值实体,并且第三类型的集合的每个三元组对应于命名关系本身的特定属性的值;在形成的三元组的集合处,分别索引所有与命名关系相关的命名实体,所有类型对的“命名实体-命名关系”和所有所有三重类型的“命名实体-命名关系-命名实体”,而考虑到各个命名实体和/或命名关系的属性;将所形成的三元组和所获得的索引以及对已经形成这些三元组的初始文本的引用存储在数据库中。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号