首页> 外文会议>International symposium on intelligent data analysis >Extracting Predictive Models from Marked-Up Free-Text Documents at the Royal Botanic Gardens,Kew,London
【24h】

Extracting Predictive Models from Marked-Up Free-Text Documents at the Royal Botanic Gardens,Kew,London

机译:从伦敦基尤皇家植物园的加标自由文本文档中提取预测模型

获取原文

摘要

In this paper we explore the combination of text-mining, un-supervised and supervised learning to extract predictive models from a corpus of digitised historical floras. These documents deal with the nomenclature, geographical distribution, ecology and comparative morphology of the species of a region. Here we exploit the fact that portions of text in the floras are marked up as different types of trait and habitat. We infer models from these different texts that can predict different habitat-types based upon the traits of plant species. We also integrate plant taxonomy data in order to assist in the validation of our models. We have shown that by clustering text describing the habitat of different floras we can identify a number of important and distinct habitats that are associated with particular families of species along with statistical significance scores. We have also shown that by using these discovered habitat-types as labels for supervised learning we can predict them based upon a subset of traits, identified using wrapper feature selection.
机译:在本文中,我们探索了文本挖掘,无监督学习和有监督学习的组合,以从数字化历史植物群中提取预测模型。这些文件涉及该地区物种的术语,地理分布,生态学和比较形态。在这里,我们利用了这样一个事实,即植物群中的文本部分被标记为不同类型的特征和栖息地。我们从这些不同的文本中推断出可以基于植物物种的特征预测不同的生境类型的模型。我们还集成了植物分类学数据,以帮助验证我们的模型。我们已经表明,通过对描述不同植物区系栖息地的文本进行聚类,我们可以识别出与特定物种家族相关的许多重要且独特的栖息地以及统计显着性得分。我们还表明,通过使用这些发现的栖息地类型作为监督学习的标签,我们可以基于使用包装特征选择识别的部分性状来预测它们。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号