The INEX 2007 evaluation was based on the Wikipedia collection in XML format. In thisudpaper we describe some modifications to the GPX search engine and the approach taken in the Ad-hocudand the Link-the-Wiki tracks. The GPX retrieval strategy is based on the construction of a collectionudsub-tree, consisting of all nodes that contain one or more of the search terms. Nodes containing searchudterms are assigned a score using the GPX ranking scheme which incorporates an extended TF-IDFudvariant. In earlier version of GPX scores were recursively propagated from text containing nodes,udthrough ancestors, all the way to the document root of the XML tree. In this paper we describe audsimplification whereby the score of each node is computed directly, doing away with the scoreudpropagation mechanism. Preliminary results indicate improved performance. The GPX search engineudwas used in the Link-the-Wiki track to identify prospective incoming links to new Wikipedia pages. Weudalso describe a simple and efficient approach to the identification of prospective outgoing links in newudWikipedia pages. We present preliminary evaluation results.
展开▼