首页> 中文期刊>情报学报 >基于句子的文本表示及中文文本分类研究

基于句子的文本表示及中文文本分类研究

     

摘要

Text mining is a key technology in information resources management. Vector space model is a mature model of text representation in text mining. Words and phrases are commonly used as feature items, but little semantic information is provided by these items. To carry out text mining based on the content, the segmentation granularity is increased from feature items to sentence. Text is represented by a bag of sentences and text similarity is defined by sentence similarity. In order to validate this representation, a Chinese text classifier has been built by KNN algorithm and good average precision (92.12 % ) and recall (92.01 %) have been achieved in the experiments.%文本挖掘技术是信息资源管理的一项关键技术.向量空间模型是文本挖掘中成熟的文本表示模型,通常以词语或短语作为特征项,但这些特征项只能提供较少的语义信息.为实现基于内容的文本挖掘,本文将文本切分粒度从词语或短语提高到句子,用句子包表示文本,使用句子相似度定义文本相似度,用KNN算法进行中文文本分类,验证模型的可行性.实验证明,基于句子包的KNN算法的平均精度(92.12%)和召回率(92.01%)是比较理想的.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号