首页> 外文期刊>农业科学学报(英文版) >Automatic extraction and structuration of soil-environment relationship information from soil survey reports
【24h】

Automatic extraction and structuration of soil-environment relationship information from soil survey reports

机译:从土壤调查报告中自动提取和构建土壤-环境关系信息

获取原文
获取原文并翻译 | 示例
       

摘要

In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils (e.g., soil survey reports) is an important potential data source for extracting soil-environment relationships. Considering that the words describing soil-environment relationships are often mixed with unrelated words, the first step is to extract the needed words and organize them in a structured way. This paper applies natural language processing (NLP) techniques to automatically extract and structure information from soil survey reports regarding soil-environment relationships. The method includes two steps:(1) construction of a knowledge frame and (2) information extraction using either a rule-based method or a statistic-based method for different types of information. For uniformly written text information, the rule-based approach was used to extract information. These types of variables include slope, elevation, accumulated temperature, annual mean temperature, annual precipitation, and frost-free period. For information contained in text written in diverse styles, the statistic-based method was adopted. These types of variables include landform and parent material. The soil species of China soil survey reports were selected as the experimental dataset. Precision (P), recall (R), and F1-measure (F1) were used to evaluate the performances of the method. For the rule-based method, the P values were 1, the R values were above 92%, and the F1 values were above 96% for all the involved variables. For the method based on the conditional random fields (CRFs), the P, R and F1 values for the parent material were, respectively, 84.15, 83.13, and 83.64%; the values for landform were 88.33, 76.81, and 82.17%, respectively. To explore the impact of text types on the performance of the CRFs-based method, CRFs models were trained and validated separately by the descriptive texts of soil types and typical profiles. For parent material, the maximum F1 value for the descriptive text of soil types was 90.7%, while the maximum F1 value for the descriptive text of soil profiles was only 75%. For landform, the maximum F1 value for the descriptive text of soil types was 85.33%, which was similar to that of the descriptive text of soil profiles (i.e., 85.71%). These results suggest that NLP techniques are effective for the extraction and structuration of soil-environment relationship information from a text data source.
机译:除了土壤样本,常规土壤图和经验丰富的土壤调查员外,有关土壤的文字(例如土壤调查报告)也是提取土壤与环境关系的重要潜在数据源。考虑到描述土壤-环境关系的词通常与不相关的词混合在一起,第一步是提取所需的词并以结构化的方式组织它们。本文应用自然语言处理(NLP)技术从土壤调查报告中自动提取和构建有关土壤与环境关系的信息。该方法包括两个步骤:(1)构造知识框架;(2)使用基于规则的方法或基于统计的方法针对不同类型的信息进行信息提取。对于统一编写的文本信息,使用基于规则的方法来提取信息。这些类型的变量包括坡度,海拔,累积温度,年平均温度,年降水量和无霜期。对于包含在各种样式的文本中的信息,采用了基于统计的方法。这些类型的变量包括地形和父级材质。选择中国土壤调查报告中的土壤种类作为实验数据集。精度(P),召回率(R)和F1量度(F1)用于评估该方法的性能。对于基于规则的方法,所有相关变量的P值均为1,R值高于92%,F1值高于96%。对于基于条件随机场(CRF)的方法,母体材料的P,R和F1值分别为84.15%,83.13和83.64%。地形值分别为88.33、76.81和82.17%。为了探索文本类型对基于CRFs方法的性能的影响,分别通过土壤类型和典型剖面的描述性文本对CRFs模型进行了训练和验证。对于母体材料,土壤类型描述文字的最大F1值为90.7%,而土壤剖面描述文字的最大F1值仅为75%。对于地形,土壤类型的描述文字的最大F1值为85.33%,与土壤剖面的描述文字的最大F1值(即85.71%)相似。这些结果表明,NLP技术可有效地从文本数据源中提取和构建土壤与环境的关系信息。

著录项

  • 来源
    《农业科学学报(英文版)》 |2019年第2期|328-339|共12页
  • 作者单位

    Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Nanjing 210023, P.R.China;

    State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing 210023, P.R.China;

    Jiangsu Center for Collaborative Innovation in Geographic Information Resource Development and Application, Nanjing 210023, P.R.China;

    Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Nanjing 210023, P.R.China;

    State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing 210023, P.R.China;

    Jiangsu Center for Collaborative Innovation in Geographic Information Resource Development and Application, Nanjing 210023, P.R.China;

    Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Nanjing 210023, P.R.China;

    State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing 210023, P.R.China;

    Jiangsu Center for Collaborative Innovation in Geographic Information Resource Development and Application, Nanjing 210023, P.R.China;

    State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, P.R.China;

    Department of Geography, University of Wisconsin-Madison, Madison, WI 53706, USA;

    Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Nanjing 210023, P.R.China;

    State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing 210023, P.R.China;

    Jiangsu Center for Collaborative Innovation in Geographic Information Resource Development and Application, Nanjing 210023, P.R.China;

    Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Nanjing 210023, P.R.China;

    State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing 210023, P.R.China;

    Jiangsu Center for Collaborative Innovation in Geographic Information Resource Development and Application, Nanjing 210023, P.R.China;

    Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Nanjing 210023, P.R.China;

    State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing 210023, P.R.China;

    Jiangsu Center for Collaborative Innovation in Geographic Information Resource Development and Application, Nanjing 210023, P.R.China;

  • 收录信息 中国科学引文数据库(CSCD);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-19 04:26:00
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号