首页> 外文会议>International conference on computational linguistics >Efficient Integrated Tagging of Word Constructs
【24h】

Efficient Integrated Tagging of Word Constructs

机译:有效的单词构造综合标记

获取原文

摘要

We describe a robust text-handling component, which can deal with free text in a wide range of formats and can successfully identify a wide range of phenomena, including chemical formulae, dates, numbers and proper nouns. The set of regular expressions used to capture numbers in written form ("sech-sundzwanzig") in German is given as an example. Proper noun "candidates" are identified by means of regular expressions, these being then rejected or accepted on the basis of run-time interaction with the user. This tagging component is integrated in a large-scale grammar development environment, and provides direct input to the grammatical analysis component of the system by means of "lift" rules which convert tagged text into partial linguistic structures.
机译:我们描述了一种强大的文本处理组件,可以在广泛的格式中处理自由文本,可以成功识别各种现象,包括化学公式,日期,数字和专有名词。以德语方式(“SECH-SUNDZWANZIG”)以德语为例捕获数字的正则表达式。通过正则表达式识别适当的名词“候选人”,然后基于与用户的运行时互动拒绝或接受这些。此标记组件集成在大规模的语法开发环境中,并通过“升降机”规则将标记文本转换为部分语言结构的“提升”规则为系统的语法分析组件提供直接输入。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号