首页> 外文会议>2017 20th International Conference of Computer and Information Technology >Ontological knowledge extraction from natural language text
【24h】

Ontological knowledge extraction from natural language text

机译:从自然语言文本中提取本体知识

获取原文
获取原文并翻译 | 示例

摘要

Ontology (Onto=Being and Logy=Knowledge, therefore the Knowledge of Being) has a significant impact on the study of natural language processing. By providing a formal representation of knowledge it ensures proper understanding of a particular domain. A comprehensive description to build a vocabulary on the given domain can never be feasible without ontologies; in other word conceptualizations. There have been a number of works in the recent times to perceive the ontological knowledge to build a strong vocabulary. Most of the existing ontology construction tools support construction of ontological relations (e.g., taxonomy, equivalence, etc.). But the main problem is that they do not support construction of domain relations, non-taxonomic conceptual relationships (e.g., causes, caused by, treat, treated by, has-member, contain, material-of, operated-by, controls, etc.) which are basically found in the text sources. The first notable work on this field is a Named Entity Recognition system developed by Stanford University. Stanford NER (also known as CRFClassifier) is a Java implementation of a Named Entity Recognizer. It can successfully recognize at most seven classes. In this research work we have utilized this NER, and proposed an algorithm which includes POS tagging, lemmatization, parsing and pronoun and co-reference resolution. We can squeeze out 22 classes from these seven primary classes. We have compared our work with an existing system named TextOntoEx. On the basis of performance matrices we have analyzed both of the works on a same natural language text. Result shows that our proposed system can find out more classes than TextOntoEx.
机译:本体论(Onto = Being和Logy = Knowledge,因此是对存在的知识)对自然语言处理的研究具有重大影响。通过提供知识的形式表示,可以确保对特定领域的正确理解。没有本体,对在给定领域建立词汇表进行全面描述是不可能的。换句话说就是概念化。近年来,已经有许多作品可以感知本体知识以建立强大的词汇表。现有的大多数本体构建工具大多数都支持本体关系的构建(例如,分类法,等价物等)。但是主要的问题是它们不支持领域关系,非分类概念关系的构建(例如,原因,由其引起,对待,由其具有成员,包含,材料,操作者,控制等) 。),这些内容基本上可以在文本源中找到。斯坦福大学开发的命名实体识别系统是该领域的第一项值得注意的工作。 Stanford NER(也称为CRFClassifier)是命名实体识别器的Java实现。它最多可以成功识别七个班级。在这项研究工作中,我们利用了这种NER,并提出了一种算法,该算法包括POS标记,词形化,解析和代词以及共指解析。我们可以从这七个小学班级中挤出22个班级。我们已经将我们的工作与名为TextOntoEx的现有系统进行了比较。根据性能矩阵,我们在相同的自然语言文本上分析了这两个作品。结果表明,我们提出的系统比TextOntoEx可以找到更多的类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号