首页>
外国专利>
Document categorization by word length distribution analysis
Document categorization by word length distribution analysis
展开▼
机译:通过单词长度分布分析对文档进行分类
展开▼
页面导航
摘要
著录项
相似文献
摘要
A system and method for efficient document categorization are disclosed. In one embodiment, word length distribution information is used as a basis for categorization. Greater than 90% accuracy in classification may be achieved in, e.g., distinguishing newspaper articles from scientific journal articles. Word length distribution information may be developed without optical character recognition (OCR), permitting use of degraded document images.
展开▼