首页> 外文会议>NAFOSTED Conference on Information and Computer Science >Text-Type Variation in Vietnamese: Corpus Mining for Linguistic Features in Narrative and Non-Narrative Genres
【24h】

Text-Type Variation in Vietnamese: Corpus Mining for Linguistic Features in Narrative and Non-Narrative Genres

机译:越南语中的文本类型变体:叙事和非叙事类型中语言特征的语料库挖掘

获取原文

摘要

In this study, we exploit two Vietnamese corpora: a narrative corpus and a non-narrative corpus. For each of these corpora, there are 24 million words collected from documents of the two genres with publication dates from 2000 to 2020. All of these words are annotated with word boundaries and parts of speech. To examine the use of linguistic features in different genres, we implement statistical analysis for word frequency, parts of speech, linguistic features, and the correlation among these features. The results show that the frequencies of the pronoun “I” and of exclamation words in narrative texts are significantly higher than those in non-narrative texts. Moreover, while adjectives are not correlated with any other features in the narrative genre, they are most likely to co-occur with third-person pronouns in the non-narrative genre.
机译:在这项研究中,我们利用了两个越南语料库:叙事语料库和非叙事语料库。对于这些基层中的每一个,有2400万字从2000到2020年的出版日期的两种类型的文件收集。所有这些词都以字界限和言语的一部分注释。为了检查不同类型中的语言特征的使用,我们为词频,语音,语言特征的部分和这些特征之间的相关性实施统计分析。结果表明,代词“i”和叙述文本中的惊叹号的频率明显高于非叙述文本中的频率。此外,虽然形容词与叙事类型中的任何其他特征无关,但它们最有可能与非叙事类型中的第三人称代词共同发生。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号