We consider a class extraction problem over semistructured data. A class C is extracted by grouping objects having similar (not necessarily identical) sets of properties into C, where the set of properties of C is the union of those of the objects in C. Let C be an extracted class and o be an object in C. If C has property P but o has no property P value, then P is null within o. An extracted class c is called typical if the number of nulls in C is small against the number of object in C and the number of properties of C. We present the following results. First, we prove that the problem of deciding if a typical class can be extracted from given semistructured data is NP-complete. Second, we present an approximation algorithm for extracting typical classes from given semistructured data. Finally, we briefly discuss a sufficient condition for the approximation algorithm to run efficiently.
展开▼