首页> 外文会议>Genetic and Evolutionary Computation Conference Pt.2 Jul 12-16, 2003 Chicago, IL, USA >Studying the Advantages of a Messy Evolutionary Algorithm for Natural Language Tagging
【24h】

Studying the Advantages of a Messy Evolutionary Algorithm for Natural Language Tagging

机译:研究用于自然语言标记的杂乱进化算法的优势

获取原文
获取原文并翻译 | 示例

摘要

The process of labeling each word in a sentence with one of its lexical categories (noun, verb, etc) is called tagging and is a key step in parsing and many other language processing and generation applications. Automatic lexical taggers are usually based on statistical methods, such as Hidden Markov Models, which works with information extracted from large tagged available corpora. This information consists of the frequencies of the contexts of the words, that is, of the sequence of their neighbouring tags. Thus, these methods rely on the assumption that the tag of a word only depends on its surrounding tags. This work proposes the use of a Messy Evolutionary Algorithm to investigate the validity of this assumption. This algorithm is an extension of the fast messy genetic algorithms, a variety of Genetic Algorithms that improve the survival of high quality partial solutions or building blocks. Messy GAs do not require all genes to be present in the chromosomes and they may also appear more than one time. This allows us to study the kind of building blocks that arise, thus obtaining information of possible relationships between the tag of a word and other tags corresponding to any position in the sentence. The paper describes the design of a messy evolutionary algorithm for the tagging problem and a number of experiments on the performance of the system and the parameters of the algorithm.
机译:用其词汇类别之一(名词,动词等)标记句子中的每个单词的过程称为标记,这是解析以及许多其他语言处理和生成应用程序的关键步骤。自动词法标记器通常基于统计方法,例如隐马尔可夫模型,该方法可处理从大标记的可用语料库中提取的信息。该信息包括单词上下文的频率,即它们相邻标签的序列的频率。因此,这些方法基于单词的标签仅取决于其周围标签的假设。这项工作建议使用一个杂乱进化算法来调查此假设的有效性。该算法是快速凌乱遗传算法(多种遗传算法的改进),这些遗传算法提高了高质量部分解或构件的存活率。杂乱的GA并不需要所有基因都存在于染色体中,它们也可能出现不止一次。这使我们能够研究出现的构造块的种类,从而获得单词的标签与对应于句子中任何位置的其他标签之间的可能关系信息。本文描述了一种用于标签问题的混乱进化算法的设计,并针对系统性能和算法参数进行了多次实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号