...
首页> 外文期刊>Natural language engineering >Dropped personal pronoun recovery in Chinese SMS
【24h】

Dropped personal pronoun recovery in Chinese SMS

机译:中文短信中的人称代词恢复下降

获取原文
获取原文并翻译 | 示例

摘要

In written Chinese, personal pronouns are commonly dropped when they can be inferred from context. This practice is particularly common in informal genres like Short Message Service messages sent via cell phones. Restoring dropped personal pronouns can be a useful preprocessing step for information extraction. Dropped personal pronoun recovery can be divided into two subtasks: (1) detecting dropped personal pronoun slots and (2) determining the identity of the pronoun for each slot. We address a simpler version of restoring dropped personal pronouns wherein only the person numbers are identified. After applying a word segmenter, we used a linear-chain conditional random field to predict which words were at the start of an independent clause. Then, using the independent clause start information, as well as lexical and syntactic information, we applied a conditional random field or a maximum-entropy classifier to predict whether a dropped personal pronoun immediately preceded each word and, if so, the person number of the dropped pronoun. We conducted a series of experiments using a manually annotated corpus of Chinese Short Message Service. Our approaches substantially outperformed a rule-based approach based partially on rules developed by Chung and Gildea (2010, Effects of Empty Categories on Machine Translation. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. pp. 636-45). Our approaches also outperformed (though by a considerably smaller margin) a machine-learning approach based closely on work by Yang, Liu, and Xue in (2015, Recovering Dropped Pronouns from Chinese Text Messages. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics.
机译:用汉语书写的人称代词通常可以从上下文中推断出来。这种做法在非正式类型中尤其常见,例如通过手机发送的短消息服务消息。恢复掉落的人称代词可能是信息提取的有用预处理步骤。掉落的人称代词恢复可以分为两个子任务:(1)检测掉落的人称代词插槽;(2)确定每个插槽的代词身份。我们提供了一个更简单的版本,用于还原掉落的人称代词,其中仅识别人的号码。应用分词器后,我们使用线性链条件随机字段来预测哪些词在独立子句的开头。然后,使用独立的从句开始信息以及词汇和句法信息,我们应用条件随机字段或最大熵分类器来预测是否在每个单词之前都出现了被丢弃的人称代词,如果是,则是代词我们使用人工注释的中文短信服务语料库进行了一系列实验。我们的方法大大优于基于规则的方法,该方法部分基于Chung和Gildea制定的规则(2010年,空类别对机器翻译的影响。自然语言处理的经验方法会议论文集(EMNLP)。计算语言学协会。 636-45)。我们的方法也比机器学习方法(虽然幅度要小得多)优于(虽然幅度要小得多),这是基于Yang,Liu和Xue在(2015年,从中文短信中恢复掉的代名词)的工作。协会第53届年会论文集计算语言学(ACL):计算语言学协会。

著录项

  • 来源
    《Natural language engineering》 |2017年第6期|905-927|共23页
  • 作者单位

    Department of Human Language Technology, The MITRE Corporation, 7515 Colshire Drive, McLean, VA, 22102, USA;

    Department of Human Language Technology, The MITRE Corporation, 7515 Colshire Drive, McLean, VA, 22102, USA;

    Department of Human Language Technology, The MITRE Corporation, 7515 Colshire Drive, McLean, VA, 22102, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号