Based on image processing technology and the web page special characteristics, a new web page segmentation algorithm-Iterated Dividing and Shrinking Algorithm is proposed. Image dividing conditions are introduced, and the dividing zone concept is given. Based on that, the web page is first transformed into image, and then by shrinking and splitting repeatedly, the image is divided into subimages which are consentaneous in vision. Experiments show that the algorithm is suitable for web page segmentation, and does well in expansibility and performance.
展开▼