首页> 外文会议>International conference of the German Society for Computational Linguistics and Language Technology >Word and Sentence Segmentation in German: Overcoming Idiosyncrasies in the Use of Punctuation in Private Communication
【24h】

Word and Sentence Segmentation in German: Overcoming Idiosyncrasies in the Use of Punctuation in Private Communication

机译:德语中的单词和句子分段:克服在私人交流中使用标点符号的特质

获取原文

摘要

In this paper, we present a segmentation system for German texts. We apply conditional random fields (CRF), a statistical sequential model, to a type of text used in private communication. We show that by segmenting individual punctuation, and by taking into account freestanding lines and that using unsupervised word representation (i. e., Brown clustering, Word2Vec and Fasttext) achieved a label accuracy of 96% in a corpus of postcards used in private communication.
机译:在本文中,我们介绍了德语文本的分割系统。我们将条件随机字段(CRF)(一种统计顺序模型)应用于私人通讯中使用的一种文本。我们表明,通过对单个标点进行分段,并考虑到独立的行以及使用无监督的字表示(即Brown聚类,Word2Vec和Fasttext),在用于私人交流的明信片语料库中,标签的准确率达到了96%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号