In this paper, we improve the trawling and point outsome communities missed by trawling. We use the DBG (DenseBipartite Graph) to identify a structure of a potential communityinstead of CBG (Complete Bipartite Graph). Based on DBG, weproposed a new method based on edge removal to extract coresfrom a web graph. Moreover, we improve the crawler to save onlypotential pages as fans of a core and save a lot of disk storagespace. To evaluate the set of cores whether or not belong to acommunity, the statistics of term frequency is used. In the paper,the dataset of experiment were crawled under domain ".en". Theresult show that the our algorithm works properly and some newcores can be found by our method.
展开▼