首页> 外文会议>IEEE International Conference on Data Science and Advanced Analytics >Classification Between Machine Translated Text and Original Text By Part Of Speech Tagging Representation
【24h】

Classification Between Machine Translated Text and Original Text By Part Of Speech Tagging Representation

机译:机器翻译文本和原始文本之间通过语音标记表示的分类

获取原文

摘要

Classification between machine-translated text and original text are often tokenized on vocabulary of the corpi. With N-grams larger than uni-gram, one can create a model that estimates a decision boundary based on word frequency probability distribution; however, this approach is exponentially expensive because of high dimensionality and sparsity. Instead, we let samples of the corpi be represented by part-of-speech tagging which is significantly less vocabulary. With less trigram permutations, we can create a model with its tri-gram frequency probability distribution. In this paper, we explore less conventional ways of approaching techniques for handling documents, dictionaries, and the likes.
机译:机器翻译文本和原始文本之间的分类通常在corpi的词汇上进行标记。使用大于gram的N-gram,可以创建一种基于词频概率分布来估计决策边界的模型;然而,由于高维和稀疏性,该方法成倍地昂贵。取而代之的是,我们用词性标记来表示corpi的样本,而词性标记的词汇量要少得多。通过较少的三字母组合,我们可以创建一个具有三字母频率概率分布的模型。在本文中,我们探索了处理文档,字典等之类的不太传统的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号