首页>
外国专利>
Feature reweighting in text classifier generation using unlabeled data
Feature reweighting in text classifier generation using unlabeled data
展开▼
机译:使用未标记数据在文本分类器生成中重新重复
展开▼
页面导航
摘要
著录项
相似文献
摘要
A mechanism is provided to implement a text classifier training augmentation mechanism for incorporating unlabeled data into the generation of a text classifier. For each term of a plurality of terms in each document of a plurality of documents in a set of unlabeled data, a term frequency value is determined. The term is normalized by dividing the term frequency value by a total number of terms in the document. An inverse document frequency (idf) value is determined for each term based on the term frequency value. A subset of terms is filtered from the plurality of terms based the determined idf values. The idf values for the remaining terms are transformed into feature weights. Terms from a set of labeled data are re-weighted based on the feature weights determined from the set of unlabeled data. The text classifier is then generated using the re-weighted labeled data.
展开▼