Techniques are provided for extracting view data from documents, where the data corresponds to an application-specific view and includes a plurality of components. Component data is identified within the view data and a view signature is generated for the view data that includes component signatures generated for each of the components on which the view data is comprised. Each component signature is generated based on the component data that corresponds to each component. The signatures generated are used to detect duplicates among the documents.
展开▼