首页>
外国专利>
METHOD TO IDENTIFY AND EXTRACT FRAGMENTS AMONG LARGE COLLECTIONS OF DIGITAL DOCUMENTS USING REPEATABILITY AND SEMANTIC INFORMATION
METHOD TO IDENTIFY AND EXTRACT FRAGMENTS AMONG LARGE COLLECTIONS OF DIGITAL DOCUMENTS USING REPEATABILITY AND SEMANTIC INFORMATION
展开▼
机译:利用可重复性和语义信息在大量数字文档中识别和提取片段的方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
Techniques for processing of digital documents using, for example, algorithms including deep learning and deep neural networks (“DNN”), to extract fragments across a corpus of documents. The extracted fragments can then be edited individual and referenced by a plurality of documents so that changes to the fragments are reflected universally across a corpus of documents efficiently. In one example case, a computer-implemented method is provided for extracting fragments in a digital document. The method includes indexing said document to generate a document element ID sequence; processing said document element ID sequence to generate at least one fragment candidate; processing said at least one fragment candidate to generate at least one respective fragment; and utilizing said at least one fragment to perform a reconstruction of said document.
展开▼