首页> 外文会议>22nd International Conference on Computational Linguistics >Modeling Chinese Documents with Topical Word-Character Models
【24h】

Modeling Chinese Documents with Topical Word-Character Models

机译:使用主题词-字符模型对中文文档进行建模

获取原文
获取原文并翻译 | 示例

摘要

As Chinese text is written without word boundaries, effectively recognizing Chinese words is like recognizing collocations in English, substituting characters for words and words for collocations. However, existing topical models that involve collocations have a common limitation. Instead of directly assigning a topic to a collocation, they take the topic of a word within the collocation as the topic of the whole collocation. This is unsatisfactory for topical modeling of Chinese documents. Thus, we propose a topical word-character model (TWC), which allows two distinct types of topics: word topic and character topic. We evaluated TWC both qualitatively and quantitatively to show that it is a powerful and a promising topic model.
机译:由于编写的中文文本没有单词边界,因此有效识别中文单词就像识别英语中的搭配词,用字符代替单词,用单词代替搭配词。但是,现有的涉及搭配的主题模型具有一个共同的局限性。他们没有直接将主题分配给搭配,而是将搭配中的单词主题作为整个搭配的主题。这对于中文文档的主题建模是不令人满意的。因此,我们提出了主题词-字符模型(TWC),该模型允许两种不同类型的主题:词主题和字符主题。我们对TWC进行了定性和定量评估,以表明它是一个强大而有希望的主题模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号