首页> 外国专利> CONSTRUCTING CORPUS OF COMPARABLE DOCUMENTS BASED ON UNIVERSAL MEASURE OF SIMILARITY

CONSTRUCTING CORPUS OF COMPARABLE DOCUMENTS BASED ON UNIVERSAL MEASURE OF SIMILARITY

机译:基于普遍性相似度的可比文件语料库的构建

摘要

FIELD: data processing.;SUBSTANCE: invention relates to a method, computer-readable data medium and a system for creating a corpus of comparable documents. Method involves obtaining, by a computing device, an initial set of documents containing text, performance, by computing device, semantic-syntactic analysis of text to construct language-independent semantic structures of sentences of text of said documents, calculating values of a universal measure of similarity for groups of documents by comparing constructed, language-independent semantic structures for texts of said documents, detecting, by computing device, groups of similar documents based on calculated values of universal measure of similarity of groups of documents, forming, by computing device, a corpus of comparable documents based on detected similar documents.;EFFECT: technical result consists in possibility of automatic generation of a corpus of comparable documents.;15 cl, 15 dwg
机译:用于创建可比较文档的语料库的方法,计算机可读数据介质和系统技术领域本发明涉及用于创建可比较文档的语料库的方法,计算机可读数据介质和系统。该方法涉及通过计算设备获得包含文本的文档的初始集合,通过计算设备执行文本的语义-句法分析以构造所述文档的文本的句子的与语言无关的语义结构,计算通用量度的值通过比较所述文档的文本的构造的,语言无关的语义结构,通过计算设备基于文档组的相似性的通用度量的计算值来检测相似文档的组,通过计算设备来形成文档组的相似性,基于检测到的相似文档的可比文档的语料库。效果:技术成果包括自动生成可比文档的语料库的可能性。15cl,15 dwg

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号