FIELD: data processing.;SUBSTANCE: invention relates to a method, computer-readable data medium and a system for creating a corpus of comparable documents. Method involves obtaining, by a computing device, an initial set of documents containing text, performance, by computing device, semantic-syntactic analysis of text to construct language-independent semantic structures of sentences of text of said documents, calculating values of a universal measure of similarity for groups of documents by comparing constructed, language-independent semantic structures for texts of said documents, detecting, by computing device, groups of similar documents based on calculated values of universal measure of similarity of groups of documents, forming, by computing device, a corpus of comparable documents based on detected similar documents.;EFFECT: technical result consists in possibility of automatic generation of a corpus of comparable documents.;15 cl, 15 dwg
展开▼