Many applications, such as knowledge base completion and automated diagnosis of patients, only have access to positive examples but lack negative examples which are required by standard relational learning techniques and suffer under the closed-world assumption. The corresponding propositional problem is known as Positive and Unlabeled (PU) learning. In this field, it is known that using the label frequency (the fraction of true positive examples that are labeled) makes learning easier. This notion has not been explored yet in the relational domain. The goal of this work is twofold: (1) to explore if using the label frequency would also be useful when working with relational data and (2) to propose a method for estimating the label frequency from relational positive and unlabeled data. Our experiments confirm the usefulness of knowing the label frequency and of our estimate.
展开▼