首页> 外文会议>Language technology resources and tools for digital humanities >Original-Transcribed Text Alignment for Man'yosyu Written by Old Japanese Language
【24h】

Original-Transcribed Text Alignment for Man'yosyu Written by Old Japanese Language

机译:旧日语写的Man'yosyu的原文转录文本对齐方式

获取原文
获取原文并翻译 | 示例

摘要

We are constructing an annotated diachronic corpora of the Japanese language. In part of this work, we construct a corpus of Man'yosyu, which is an old Japanese poetry anthology. In this paper, we describe how to align the transcribed text and its original text semiautomatically to be able to cross-reference them in our Man'ydsyu corpus. Although we align the original characters to the transcribed words manually, we preliminarily align the transcribed and original characters by using an unsupervised automatic alignment technique of statistical machine translation to alleviate the work. We found that automatic alignment achieves an F1-measure of 0.83; thus, each poem has 1-2 alignment errors. However, finding these errors and modifying them are less work-intensive and more efficient than fully manual annotation. The alignment probabilities can be utilized in this modification. Moreover, we found that we can locate the uncertain transcriptions in our corpus and compare them to other transcriptions, by using the alignment probabilities.
机译:我们正在构建日语的带注释的历时语料库。在这项工作的一部分中,我们构建了Man'yosyu语料库,这是一种日本古老的诗歌选集。在本文中,我们描述了如何半自动对齐转录的文本及其原始文本,以便能够在Man'ydsyu语料库中交叉引用它们。尽管我们手动将原始字符与转录的单词对齐,但我们还是使用统计机器翻译的无监督自动对齐技术来预先对齐转录的和原始字符,以减轻工作量。我们发现自动对齐可达到0.83的F1度量;因此,每首诗有1-2个对齐错误。但是,与完全手动注释相比,发现这些错误并对其进行修改的工作量较少,效率更高。在该修改中可以利用对准概率。此外,我们发现我们可以使用比对概率将不确定的转录定位在语料库中,并将其与其他转录进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号