首页> 外文会议>Knowledge Discovery and Data Mining, 2010. WKDD '10 >Discover Linguistic Patterns in Parsed Corpus with Frequent Subrtree Mining
【24h】

Discover Linguistic Patterns in Parsed Corpus with Frequent Subrtree Mining

机译:通过频繁的Subrtree挖掘发现已解析的语料库中的语言模式

获取原文
获取外文期刊封面目录资料

摘要

Recognition of special linguistic patterns in a certain language is very helpful for many NLP applications such as information extraction, machine translation and parsing. State-of-the-arts syntax parsers are based on given grammar. The used grammar is context free and cannot discover complex patterns which contain multiple linguistic units. We propose an unsupervised method to automatically discover the complex linguistic patterns from a classically parsed corpus. A specialized and efficient algorithm is applied to mine the frequent subtrees in the forest and the found subtrees are formalized as the linguistic patterns. The approach is validated on the Penn Chinese Treebank with found linguistic patterns.
机译:识别某种语言中的特殊语言模式对于许多NLP应用程序(例如信息提取,机器翻译和解析)非常有帮助。最新的语法解析器基于给定的语法。使用的语法不受上下文限制,无法发现包含多个语言单元的复杂模式。我们提出了一种无监督的方法,可以从经典解析的语料库中自动发现复杂的语言模式。应用一种专业高效的算法在森林中挖掘频繁的子树,并将找到的子树形式化为语言模式。该方法已在Penn Chinese Treebank上以发现的语言模式进行了验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号