Three algorithms for processing joins on attributes of a textual type are presented and analyzed in this paper. Since such joins often involve document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ according to whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and simulation results indicate that the relative performance of these algorithms depends on the input document collections, the system characteristics and the input query. For each algorithm, the type of input document collection with which the algorithm is likely to perform well is identified.
展开▼