...
首页> 外文期刊>Information >Ontological Semantic Annotation of an English Corpus Through Condition Random Fields
【24h】

Ontological Semantic Annotation of an English Corpus Through Condition Random Fields

机译:条件随机场对英语语料库的本体语义注解

获取原文
   

获取外文期刊封面封底 >>

       

摘要

One way to increase the understanding of texts by machines is through adding semantic information to lexical items by including metadata tags, a process also called semantic annotation. There are several semantic aspects that can be added to the words, among them the information about the nature of the concept denoted through the association with a category of an ontology. The application of ontologies in the annotation task can span multiple domains. However, this particular research focused its approach on top-level ontologies due to its generalizing characteristic. Considering that annotation is an arduous task that demands time and specialized personnel to perform it, much is done on ways to implement the semantic annotation automatically. The use of machine learning techniques are the most effective approaches in the annotation process. Another factor of great importance for the success of the training process of the supervised learning algorithms is the use of a sufficiently large corpus and able to condense the linguistic variance of the natural language. In this sense, this article aims to present an automatic approach to enrich documents from the American English corpus through a CRF model for semantic annotation of ontologies from Schema.org top-level. The research uses two approaches of the model obtaining promising results for the development of semantic annotation based on top-level ontologies. Although it is a new line of research, the use of top-level ontologies for automatic semantic enrichment of texts can contribute significantly to the improvement of text interpretation by machines.
机译:增加机器对文本的理解的一种方法是通过包含元数据标签将语义信息添加到词汇项中,该过程也称为语义注释。可以将几个语义方面添加到单词中,其中包括通过与本体类别关联而表示的有关概念性质的信息。注释任务中本体的应用可以跨越多个领域。但是,由于其普遍性,该特定研究将其方法集中在顶级本体上。考虑到注释是一项艰巨的任务,需要时间和专门人员来执行,因此在自动实现语义注释的方法上做了很多工作。机器学习技术的使用是注释过程中最有效的方法。对于有监督学习算法的训练过程的成功而言,另一个非常重要的因素是使用足够大的语料库,并且能够压缩自然语言的语言差异。从这个意义上讲,本文旨在提出一种自动方法,该方法通过CRF模型从Schema.org顶级本体进行语义注释,从而丰富美国英语语料库中的文档。该研究使用两种方法为基于顶级本体的语义注释开发提供了有希望的结果。尽管这是一条新的研究领域,但使用顶级本体进行文本的自动语义丰富可以极大地改善机器对文本的解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号