Text Analysis Meets Computational Lexicography

机译：文本分析遇到计算词典

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

More and more text corpora are available electronically. They contain information about linguistic and lexicographic properties of words, and word combinations. The amount of data is too large to extract the information manually. Thus, we need means for a (semi-)automatic processing, i.e., we need to analyse the text to be able to extract the relevant information. The question is what are the requirements for a text analysing tool, and do existing systems meet the needs of lexicographic acquisition. The hypothesis is that the better and more detailed the off-line annotation, the better and faster the on-line extraction. However, the more detailed the off-line annotation, the more complex the grammar, the more tune consuming and difficult the grammar development, and the slower the parsing process. For the application as an analyzing tool in computational lexicography a symbolic chunker with a hand-written grammar seems to be a good choice. The available chunkers for German, however, do not consider all of the additional information needed for this task such as head lemma, morpho-syntactic information, and lexical or semantic properties, which are useful if not necessary for extraction processes. Thus, we decided to build a recursive chunker for unrestricted German text within the framework of the IMS Corpus Workbench (CWB).

机译：越来越多的文本语料库可以通过电子方式获得。它们包含有关单词以及单词组合的语言和词典学特性的信息。数据量太大，无法手动提取信息。因此，我们需要用于（半）自动处理的手段，即，我们需要分析文本以能够提取相关信息。问题是文本分析工具的要求是什么，现有系统是否满足词典词典获取的需求。假设是，离线注释越好和越详细，在线提取越快越好。但是，离线注释越详细，语法越复杂，语法开发就越耗音且越难，并且解析过程越慢。对于在计算词典学中作为分析工具的应用程序而言，具有手写语法的符号分块器似乎是一个不错的选择。但是，德语可用的分块器并未考虑该任务所需的所有其他信息，例如头词引理，形态句法信息以及词法或语义属性，这些信息对于提取过程不是必需的。因此，我们决定在IMS Corpus工作台（CWB）框架内为不受限制的德语文本构建一个递归块。

著录项

来源
《20th International Conference on Computational Linguistics vol.2》|2004年|P.854-860|共7页
会议地点 Geneva(CH)
作者
Hannah Kermes;
展开▼
作者单位

Institut fuer Maschinelle Sprachverarbeitung, Azenbergstr. 12, 70174 Stuttgart, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. Towards superconductivity in hydrides: computational studies of two hypothetical ternary compounds, $ {text{Yb}}^{{{text{II}}}} {text{BeH}}_{4} $ and $ {text{Cs}}_{3} {text{Yb}}^{{{text{III}}}} {text{H}}_{6} $ [J] . Tomasz Jaroń, Wojciech Grochala, Roald Hoffmann Journal of Molecular Modeling . 2007,第6a7期

机译：迈向氢化物的超导性：对两个假设的三元化合物$ {text {Yb}} ^ {{{{text {II}}}} {text {BeH}} _ {4} $和$ {text {Cs}}的计算研究_ {3} {text {Yb}} ^ {{{text {III}}}} {text {H}} _ {6} $
2. The Utilisation of Outer Texts in the Practical Lexicography of African Languages [J] . Emmanuel Chabata, Dion Nkomo Lexikos . 2010,第0期

机译：在非洲语言的实用词典学中外文的运用
3. Lynne Bowker (ed.). Lexicography, Terminology, and Translation. Text-based studies in honour of Ingrid Meyer. [J] . John Humbley International Journal of Lexicography . 2008,第1期

机译：琳恩·鲍克（Lynne Bowker）（编辑）。辞书，术语和翻译。基于文本的研究，以纪念Ingrid Meyer。
4. Text Analysis Meets Computational Lexicography [C] . Hannah Kermes International Conference on Computational Linguistics . 2004

机译：文本分析符合计算词典
5. Theoretical considerations about ellipsis: Semantic and lexicographic repercussions of ellipsis originated by lexical combination (Spanish text). [D] . Paredes Duarte, Maria Jesus. 2002

机译：关于省略号的理论考虑：省略词的语义和词典翻译影响是由词汇组合引起的（西班牙语）。
6. Understanding the trends in international agreements on pricing and reimbursement for newly marketed medicines and their implications for access to medicines: a computational text analysis [O] . Kyung-Bok Son 2020

机译：了解新销售药物的定价和报销的国际协定趋势及其对药物获取的影响：计算文本分析
7. Text analysis meets computational lexicography [O] . Hannah Kermes 2004

机译：文本分析符合计算词典

Text Analysis Meets Computational Lexicography

摘要

著录项

相似文献

相关主题

期刊订阅