The Varro toolkit is a system for identifying and counting a major class of regularity in treebanks and annotated natural language data in the form of treestructures:frequently recurring unordered subtrees. This software has been designed for use in linguistics to be maximally applicable to actually existing treebanks and other stores of tree-structurable natural language data. It minimizes memory use so that moderately large treebanks are tractable on commonly available computer hardware. This article introduces condensed canonically ordered trees as a data structure for efficiently discovering frequently recurring unordered subtrees.
展开▼