首页> 外文会议>Federated Conference on Computer Science and Information Systems >Concepts extraction from unstructured Polish texts: A rule based approach
【24h】

Concepts extraction from unstructured Polish texts: A rule based approach

机译:从非结构化波兰语文本中提取概念:基于规则的方法

获取原文

摘要

We present recently developed solution allowing extraction of concepts from unstructured Polish texts with special focus on correct morphological forms of obtained concept names. As Polish is a highly inflected language, detected names need to be transformed following Polish grammar rules. We propose a user-friendly method for specification of transformation patterns, which is based on a simple annotations language. Annotations prepared by a user are compiled into transformation rules. During the concept extraction process the input document is split into sentences and the rules are applied to sequences of words comprised in sentences. Recognized strings forming concept names are aggregated at various levels and assigned with scores. We report also results of initial experiments performed on a medical text.
机译:我们提出了最近开发的解决方案,允许从非结构化的波兰文本中提取概念,特别关注获得的概念名称的正确形态形式。由于波兰语是一种高度弯曲的语言,因此需要遵循波兰语语法规则来转换检测到的名称。我们提出了一种用于用户定义转换模式的友好方法,该方法基于一种简单的注释语言。用户准备的注释将被编译为转换规则。在概念提取过程中,将输入文档拆分为句子,然后将规则应用于句子中包含的单词序列。构成概念名称的可识别字符串在各个级别进行汇总,并分配分数。我们还报告了对医学文献进行的初步实验的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号