pTree technology represent and process data differently from the ubiquitous horizontal data technologies. In pTree technology, the data is structured column-wise and the columns are processed horizontally (typically across a few to a few hundred columns), while in horizontal technologies, data is structured row-wise and those rows are processed vertically (often down millions, even billions of rows). pTree technology is a vertical data technology. P-trees are lossless, compressed and data-mining ready data structures [9][10]. pTrees are lossless because the vertical bit-wise partitioning that is used in the pTree technology guarantees that all information is retained completely. There is no loss of information in converting horizontal data to this vertical format. pTrees are compressed because in this technology, segments of bit sequences which are either purely 1-bits or purely 0-bits, are represented by a single bit. This compression saves a considerable amount of space, but more importantly facilitates faster processing. pTrees are data-mining ready because the fast, horizontal data mining processes involved can be done without the need to decompress the structures first. pTree vertical data structures have been exploited in various domains and data mining algorithms, ranging from classification {1,2,3], clustering [4,7], association rule mining [9], as well as other data mining algorithms. PTree technology is patented in the U.S. Speed improvements are very important in data mining because many quite accurate algorithms require an unacceptable amount of processing time to complete, even with today's powerful computing systems and efficient software platforms. In this paper, we evaluate and compare the speed of various clustering data mining algorithms when using pTree technology.
展开▼