首页>
外国专利>
METHOD OF IDENTIFYING DOCUMENTS WITH SIMILAR PROPERTIES UTILIZING PRINCIPAL COMPONENT ANALYSIS
METHOD OF IDENTIFYING DOCUMENTS WITH SIMILAR PROPERTIES UTILIZING PRINCIPAL COMPONENT ANALYSIS
展开▼
机译:利用主成分分析识别具有相似属性的文档的方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention generally provides methods and systems for characterizing texts, for example, for identifying textual documents by language, topic, author, or other attributes. In some embodiments, a method of the invention can include creating an n-gram frequency spectrum for a document under analysis, preferably selecting a subset of the n-gram frequency spectrum, transforming the n-gram frequency spectrum into principal component space, and identifying one or more attributes of the document according to its similarity to (or distinction from) reference documents in the principal component space.
展开▼