...
首页> 外文期刊>International Journal of Enterprise Information Systems >Chinese Text Categorization via Bottom-Up Weighted Word Clustering
【24h】

Chinese Text Categorization via Bottom-Up Weighted Word Clustering

机译:通过自下而上的加权词聚类对中文文本进行分类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Most of the researches on text categorization are focus on using bag of words. Some researches provided other methods for classification such as term phrase, Latent Semantic Indexing, and term clustering. Term clustering is an effective way for classification, and had been proved as a good method for decreasing the dimensions in term vectors. The authors used hierarchical term clustering and aggregating similar terms. In order to enhance the performance, they present a modify indexing with terms in cluster. Their test collection extracted from Chinese NETNEWS, and used the Centroid-Based classifier to deal with the problems of categorization. The results had shown that term clustering is not only reducing the dimensions but also outperform than bag of words. Thus, term clustering can be applied to text classification by using any large corpus, its objective is to save times and increase the efficiency and effectiveness. In addition to performance, these clusters can be considered as conceptual knowledge base, and kept related terms of real world.
机译:关于文本分类的大多数研究都集中在使用单词袋上。一些研究提供了其他分类方法,例如术语短语,潜在语义索引和术语聚类。术语聚类是一种有效的分类方法,已被证明是减少术语向量维数的好方法。作者使用了层次化术语聚类和聚合相似术语。为了提高性能,他们使用簇中的术语提供了修改索引。他们的测试集摘自中文NETNEWS,并使用基于质心的分类器来处理分类问题。结果表明,术语聚类不仅减少了维数,而且比单词袋表现更好。因此,术语聚类可以通过使用任何大型语料库应用于文本分类,其目的是节省时间并提高效率和有效性。除了性能之外,这些集群还可以视为概念知识库,并保留了现实世界的相关术语。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号