基于特征词复合权重的关联网页分类

兰均; 施化吉; 李星毅; 徐敏

首页> 中文期刊> 《计算机科学》 >基于特征词复合权重的关联网页分类

基于特征词复合权重的关联网页分类

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

There are two shortages when the method of classification based on association rules is applied to classif; the Web documents:one is that the method process the Web document as a plain text, ignoring the HTML tags information of the Web page ; another is that either item of the association rules is only the words in the Web page,without considering the weight of the words,or it quantifies the weight of the word frequency, ignoring the importance of the location of the words in the Web document. Therefore,a new efficient method was proposed in the paper. It calculates the word's mixed weight by the information of the HTML tags feature, and then mines the classification rules based on the mixed weight to classify the Web pages. The result of experiment shows that the performance of this approach is better than the traditional associated classification methods.%针对网页分类中关联分类方法存在的如下两点不足:(1)仅把网页当成纯文本处理,忽略了网页的标签信息,(2)仅用网页中的特征词作为关联规则的项,没有考虑特征词的权重,或仅以词频来量化权重,忽略了特征词位置特征的影响,提出了基于特征词复合权重的关联网页分类方法.该方法利用网页标签信息所体现的位置特征计算特征词的复合权重,并以此权重为基础建立分类规则,对网页进行分类.实验结果表明,该方法取得了比传统的关联分类方法更好的效果.

著录项

来源
《计算机科学》 |2011年第3期|187-190|共4页
作者
兰均; 施化吉; 李星毅; 徐敏;
展开▼
作者单位

江苏大学计算机科学与通信工程学院;

镇江;

212013;

江苏大学计算机科学与通信工程学院;

镇江;

212013;

江苏大学计算机科学与通信工程学院;

镇江;

212013;

南通大学计算机科学与技术学院;

南通;

226019;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
网页分类; 关联规则; 位置特征; 复合权重;

相似文献

中文文献
外文文献
专利

1. 一种基于HowNet语义计算的综合特征词权重计算方法 [J] . 孙丽莉 ,张小刚 . 统计与决策 . 2018,第18期
2. XML文档聚类中基于语义的特征词权重计算方法 [J] . 龙鹏飞 ,石奇 . 长沙理工大学学报（自然科学版） . 2015,第002期
3. 基于特征词权重的文本分类 [J] . 杨莉 ,万常选 ,雷刚 . 计算机与现代化 . 2012,第010期
4. 基于图的特征词权重算法及其在文档排序中的应用 [J] . 黄云 ,洪佳明 ,颜一鸣 . 计算机系统应用 . 2012,第006期
5. 基于潜在语义索引的文本特征词权重计算方法 [J] . 李媛媛 ,马永强 . 计算机应用 . 2008,第006期
6. 基于层次特征词权重的文本分类方法 [C] . 耿增民 ,贾云得 ,刘万春 . 2005第一届中国分类技术与应用研讨会（CSCA） . 2005
7. 基于信息增益和信息熵的特征词权重计算研究 [A] . 李海瑞 . 2012

基于特征词复合权重的关联网页分类

摘要

著录项

相似文献

相关主题

期刊订阅