首页>
外国专利>
System and method for automatic document classification in ediscovery, compliance and legacy information clean-up
System and method for automatic document classification in ediscovery, compliance and legacy information clean-up
展开▼
机译:用于电子发现,法规遵从和遗留信息清理的自动文档分类的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A system, method and computer program product for automatic document classification, including an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classification for display to a user.
展开▼