基于隐含狄利克雷分布的多语种文本的自动检测研究

张巍; 李雯; 陈丹; 李增杰

首页> 中文期刊>中国海洋大学学报（自然科学版） >基于隐含狄利克雷分布的多语种文本的自动检测研究

基于隐含狄利克雷分布的多语种文本的自动检测研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The paper proposed an unsupervised multilingual identification method based on Latent Dirichlet Allocation to deal with the automatic detection of multilingual text.From the perspective of speech recognition,it reforms the LDA for language identification,using n-grams as the features.Different from the usual method of selection of topic number according to the perplexity,the paper introduces a new method based on minimum description length (MDL for short),adopting the Collapsed Gibbs Sampling as the learning method to construct the unsupervised language identification based on the LDA model.The paper takes the mitlm toolkit to generate N-gram counting files and establishes the character level's language model in multilingual identification.Then the paper uses three other language identification systems for comparison with our LDA model.The experiment chooses nine euro languages form the ECI/MCI benchmark to do the identification experiment,at the same time the paper makes a detailed analyze on the trail results,realizing a good accuracy and recall result without any annotation.%本文提出无监督的基于隐含狄利克雷分布(LDA)的潜在语义模型来处理多语种混合文本的语种鉴别问题.区别于一般的依据困惑度对模型进行筛选的方法,本文介绍一种基于最小描述长度(MDL)的新方法,用collapsed Gibbs Sampling(CGS)学习算法来训练得到相应的LDA模型.本文采用mithm工具包生成Ngram计数文件并构建了用于多语种识别的字符级语言模型.之后本文使用了3种不同的语种鉴别系统与LDA模型做对比实验.实验选取ECI/MCI标准数据库中9种欧洲系语言进行鉴别实验,在没有任何标注的情况下,实现了较好的准确率和召回率结果.

著录项

来源
《中国海洋大学学报（自然科学版）》|2017年第12期|130-136|共7页
作者
张巍; 李雯; 陈丹; 李增杰;
展开▼
作者单位

中国海洋大学信息科学与工程学院,山东青岛266100;

中国海洋大学信息科学与工程学院,山东青岛266100;

中国海洋大学信息科学与工程学院,山东青岛266100;

中国海洋大学信息科学与工程学院,山东青岛266100;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
多语种识别; 无监督; 潜在狄利克莱分配; 最小描述长度; Collapsed Gibbs抽样;
入库时间 2023-07-25 17:13:40

相似文献

中文文献
外文文献
专利

1. 文本分类中基于单词表示的全局向量模型和隐含狄利克雷分布的文本表示改进方法 [J] . 陈可嘉 ,刘惠 . 科学技术与工程 . 2021,第029期
2. 基于商品评论主题模型的隐含狄利克雷分布研究 [J] . 周梁 ,方兴龙 . 安徽工程大学学报 . 2019,第001期
3. 基于商品评论主题模型的隐含狄利克雷分布研究 [J] . 周梁1 ,方兴龙2 . 安徽工程大学学报 . 2019,第001期
4. 基于隐含狄利克雷分配模型的消费者在线评论复杂网络构建及其应用 [J] . 刘晓君 ,那日萨 ,崔雪莲 . 系统工程学报 . 2017,第003期
5. 基于隐含狄利克雷模型的文献主题演化预测 [J] . 茅利锋 ,张伟 . 计算机技术与发展 . 2016,第009期
6. 基于隐含狄列克雷分配的短文本分类方法 [C] . 张志飞 ,苗夺谦 ,高灿 . 第六届全国青年计算语言学会议 . 2012
7. 多标签隐含狄利克雷分配及其并行化应用 [A] . 朱运 . 2012

基于隐含狄利克雷分布的多语种文本的自动检测研究

摘要

著录项

相似文献

相关主题

期刊订阅