首页>
外国专利>
METHOD AND SYSTEM FOR DETECTING DUPLICATED DOCUMENT USING DOCUMENT SIMILARITY MEASURING MODEL BASED ON DEEP LEARNING
METHOD AND SYSTEM FOR DETECTING DUPLICATED DOCUMENT USING DOCUMENT SIMILARITY MEASURING MODEL BASED ON DEEP LEARNING
展开▼
机译:基于深度学习的文档相似测量模型检测重复文档的方法和系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
Disclosed is a method and system for detecting duplicate documents using a deep learning-based document similarity measurement model. A duplicate document detection method according to an embodiment extracts a similar document pair set including a plurality of similar document pairs having the same attribute and a dissimilar document pair set including a plurality of randomly extracted dissimilar document pairs from a document database calculating a mathematical similarity using a mathematical scale for each of the plurality of similar document pairs and each of the plurality of dissimilar document pairs, increasing the mathematical similarity calculated for each of the plurality of similar document pairs, and increasing the plurality of similar document pairs calculating the semantic similarity for each of the plurality of similar document pairs and each of the plurality of dissimilar document pairs by reducing the mathematical similarity calculated for each dissimilar document pair of The method may include training a similarity model using a plurality of dissimilar document pairs and the semantic similarity, and detecting, by the at least one processor, a duplicate document using the similarity model.
展开▼