【24h】

Learning to Change Taxonomies

机译:学习改变分类学家

获取原文

摘要

Taxonomies are valuable tools for structuring and representing our knowledge about the world. They are widely used in many domains, where information about species, products, customers, publications, etc. needs to be organized. In the absence of standards, many taxonomies of the same entities can co-exist. A problem arises when data categorized in a particular taxonomy needs to be used by a procedure (methodology or algorithm) that uses a different taxonomy. Usually, a labor-intensive manual approach is used to solve this problem. This paper describes a machine learning approach which aids domain experts in changing taxonomies. It allows learning relationships between two taxonomies and mapping the data from one taxonomy into another. The proposed approach uses decision trees and bootstrapping for learning mappings of instances from the source to the target taxonomies. A C4.5 decision tree classifier is trained on a small manually labeled training set and applied to a randomly selected sample from the unlabeled data. The classification results are analyzed and the misclassified items are corrected and all items are added to the training set. This procedure is iterated until unlabeled data is available or an acceptable error rate is reached. In the latter case the last classifier is used to label all the remaining data. We test our approach on a database of products obtained from as grocery store chain and find that it performs well, reaching 92.6% accuracy while requiring the human expert to explicitly label only 18% of the entire data.
机译:分类管理是制定和代表我们对世界知识的有价值的工具。它们广泛应用于许多域,其中需要组织有关物种,产品,客户,出版物等的信息。在没有标准的情况下,许多同一实体的分类可以共存。当使用不同分类法的过程(方法论或算法)中需要使用特定分类的数据来使用时出现问题。通常,使用劳动密集型的手动方法来解决这个问题。本文介绍了一种机器学习方法,帮助域专家在改变分类时。它允许在两个分类学之间学习关系并将数据从一个分类映射到另一个分类物中。所提出的方法使用决策树并引导用于从源到目标分类的实例的映射。 C4.5决策树分类器在小型手动标记的训练中培训,并从未标记数据应用于随机选择的样本。分析分类结果并纠正错误分类的项目,并将所有项目添加到培训集中。迭代此过程直到未标记的数据可用或达到可接受的错误率。在后一种情况下,最后一个分类器用于标记所有剩余数据。我们在从作为杂货店链获得的产品数据库中测试我们的方法,并发现它表现良好,准确达到92.6%,同时要求人类专家明确标记整个数据的18%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号