首页> 外文会议>The mathematics of language >What Is the Basic Semantic Unit of Chinese Language? A Computational Approach Based on Topic Models
【24h】

What Is the Basic Semantic Unit of Chinese Language? A Computational Approach Based on Topic Models

机译:中文的基本语义单位是什么?基于主题模型的计算方法

获取原文
获取原文并翻译 | 示例

摘要

Chinese language has been generally regarded as a Subject-Verb -Object (SVO) language and the basic semantic unit is the Chinese word that is usually consisted by two or more Chinese characters. However, word-centered structure of Chinese language has been controversial in linguistics. Some recent research in computational linguistics in Chinese language suggests that the character-based models perform better than the word-based models in some applications such word segmentation. In this paper, the word-based topic models and the character-based models are tested for modeling Chinese language, respectively. By empirical studies, we demonstrated the effectiveness of using Chinese characters as the basic semantic units. These two models have close performance in text classifications while the character-based model has a better quality in language modeling and a much smaller vocabulary. By testing on a bilingual corpus, three independent topic models based on Chinese words, Chinese characters and English words are trained and compared to each other, we verify the capability of topic models in modeling semantics by experiments across Chinese and English. The classification accuracy can also be boosted up by aggregating the classification results from the three independent topic models.
机译:汉语通常被认为是主语-宾语(SVO)语言,基本语义单位是通常由两个或两个以上汉字组成的汉字。但是,汉语的以词为中心的结构在语言学上一直存在争议。最近在中文语言中进行的一些计算语言学研究表明,在某些应用中,基于字符的模型比基于单词的模型的性能要好于基于单词的模型。本文分别测试了基于单词的主题模型和基于字符的模型来建模中文。通过实证研究,我们证明了使用汉字作为基本语义单位的有效性。这两个模型在文本分类中具有接近的性能,而基于字符的模型在语言建模中具有更好的质量,并且词汇量要少得多。通过在双语语料库上进行测试,对基于汉字,汉字和英语单词的三个独立主题模型进行了训练并进行了比较,我们通过中英文实验验证了主题模型在语义建模中的能力。通过汇总三个独立主题模型的分类结果,也可以提高分类的准确性。

著录项

  • 来源
    《The mathematics of language 》|2011年|p.143-157|共15页
  • 会议地点 Nara(JP);Nara(JP)
  • 作者

    Qi Zhao; Zengchang Qin; Tao Wan;

  • 作者单位

    Intelligent Computing and Machine Learning Lab,School of Automation Science and Electrical Engineering,Beihang University, Beijing, China;

    Intelligent Computing and Machine Learning Lab,School of Automation Science and Electrical Engineering,Beihang University, Beijing, China,Robotics Institute, Carnegie Mellon University, USA;

    Robotics Institute, Carnegie Mellon University, USA;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 程序设计、软件工程 ;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号