首页> 外文会议>International Symposium on Intelligent Data Analysis >Extracting Predictive Models from Marked-Up Free-Text Documents at the Royal Botanic Gardens, Kew, London
【24h】

Extracting Predictive Models from Marked-Up Free-Text Documents at the Royal Botanic Gardens, Kew, London

机译:从皇家植物园,Kew,London的标记自由文档中提取预测模型

获取原文

摘要

In this paper we explore the combination of text-mining, un-supervised and supervised learning to extract predictive models from a corpus of digitised historical floras. These documents deal with the nomenclature, geographical distribution, ecology and comparative morphology of the species of a region. Here we exploit the fact that portions of text in the floras are marked up as different types of trait and habitat. We infer models from these different texts that can predict different habitat-types based upon the traits of plant species. We also integrate plant taxonomy data in order to assist in the validation of our models. We have shown that by clustering text describing the habitat of different floras we can identify a number of important and distinct habitats that are associated with particular families of species along with statistical significance scores. We have also shown that by using these discovered habitat-types as labels for supervised learning we can predict them based upon a subset of traits, identified using wrapper feature selection.
机译:在本文中,我们探索了文本挖掘,未经监督和监督学习的组合,以从数字化历史群的一部分中提取预测模型。这些文件处理了一个地区物种的命名法,地理分布,生态和比较形态。在这里,我们利用了植物中文本的部分被标记为不同类型的特质和栖息地。我们从这些不同的文本推断模型,这些文本可以根据植物物种的特征预测不同的栖息地。我们还集成了植物分类数据,以协助验证我们的模型。我们已经表明,通过培养文本,描述了不同植物的栖息地,我们可以识别与种类的特殊家庭相关的一些重要和独特的栖息地以及统计显着性评分。我们还表明,通过使用这些发现的栖息地类型作为监督学习的标签,我们可以基于使用包装器特征选择来识别的特征子集来预测它们。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号