Currently, in web page classification, Absolute Weighting Method is a common method to weight HTML main structure features. The disadvantage of the method is that weighting coefficient is a fixed value, which has different effects on the long and short text. So the influence of structure features on local text will be weakened with the length of local text increasing. To solve the problem, we propose an improved weighting method, namely Relative Weighting Method. In the experiment of web page hierarchical classification, we compare the two methods’ classification performance on a single label and several labels combination. The results show that Relative Weighting Method can effectively improve the classification accuracy, which is better than the Absolute Weighting Method.
展开▼