【24h】

Text Analysis Meets Computational Lexicography

机译:文本分析遇到计算词典

获取原文
获取原文并翻译 | 示例

摘要

More and more text corpora are available electronically. They contain information about linguistic and lexicographic properties of words, and word combinations. The amount of data is too large to extract the information manually. Thus, we need means for a (semi-)automatic processing, i.e., we need to analyse the text to be able to extract the relevant information. The question is what are the requirements for a text analysing tool, and do existing systems meet the needs of lexicographic acquisition. The hypothesis is that the better and more detailed the off-line annotation, the better and faster the on-line extraction. However, the more detailed the off-line annotation, the more complex the grammar, the more tune consuming and difficult the grammar development, and the slower the parsing process. For the application as an analyzing tool in computational lexicography a symbolic chunker with a hand-written grammar seems to be a good choice. The available chunkers for German, however, do not consider all of the additional information needed for this task such as head lemma, morpho-syntactic information, and lexical or semantic properties, which are useful if not necessary for extraction processes. Thus, we decided to build a recursive chunker for unrestricted German text within the framework of the IMS Corpus Workbench (CWB).
机译:越来越多的文本语料库可以通过电子方式获得。它们包含有关单词以及单词组合的语言和词典学特性的信息。数据量太大,无法手动提取信息。因此,我们需要用于(半)自动处理的手段,即,我们需要分析文本以能够提取相关信息。问题是文本分析工具的要求是​​什么,现有系统是否满足词典词典获取的需求。假设是,离线注释越好和越详细,在线提取越快越好。但是,离线注释越详细,语法越复杂,语法开发就越耗音且越难,并且解析过程越慢。对于在计算词典学中作为分析工具的应用程序而言,具有手写语法的符号分块器似乎是一个不错的选择。但是,德语可用的分块器并未考虑该任务所需的所有其他信息,例如头词引理,形态句法信息以及词法或语义属性,这些信息对于提取过程不是必需的。因此,我们决定在IMS Corpus工作台(CWB)框架内为不受限制的德语文本构建一个递归块。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号