Vietnamese Word Segmentation with CRFs and SVMs: An Investigation

机译：越南词分割与CRFS和SVMS：调查

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Word segmentation for Vietnamese, like for most Asian languages, is an important task which has a significant impact on higher language processing levels. However, it has received little attention of the community due to the lack of a common annotated corpus for evaluation and comparison. Also, most previous studies focused on unsupervised-statistical approaches or combined too many techniques. Consequently, their accuracies are not as high as expected. This paper reports a careful investigation of using conditional random fields (CRFs) and support vector machines (SVMs) - two of the most successful statistical learning methods in NLP and pattern recognition - for solving the task. We first build a moderate annotated corpus using different sources of materials. For a careful evaluation, different CRF and SVM models using different feature settings were trained and their results are compared and contrasted with each other. In addition, we discuss several important points about the accuracy, computational cost, corpus size and other aspects that might influence the overall quality of Vietnamese word segmentation.

机译：越南语的单词分割，就像为大多数亚洲语言一样，是对更高语言处理水平产生重大影响的重要任务。然而，由于缺乏用于评估和比较的常见注释语料库，它已经收到了很少的关注。此外，最先前的研究专注于无监督统计方法或组合太多技术。因此，它们的准确性不如预期的那么高。本文报告了对使用条件随机字段（CRF）和支持向量机（SVM）的仔细调查 - NLP中最成功的统计学习方法中的两个和模式识别 - 用于解决任务。我们首先使用不同的材料来源构建一个温和的注释语料库。对于仔细的评估，训练使用不同特征设置的不同CRF和SVM模型，并将其结果进行比较和彼此对比。此外，我们讨论了可能影响越南语分割整体质量的准确性，计算成本，语料库大小和其他方面的几个重要点。

著录项

来源
《Pacific Asia Conference on Language, Information and Computation》|2006年||共8页
会议地点
作者
Cam-Tu Nguyen; Trung-Kien Nguyen; Xuan-Hieu Phan; Le-Minh Nguyen; Quang-Thuy Ha; National Natural Science Foundation of China; Minsitry of Ecucation of China; Chinese Information Processing Society of China; Huazhong Normal University;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机网络;
关键词
word segmentation; segmenting and labeling sequence data; conditional random fields; support vector machines; maximum matching;

机译：字分割;分段和标记序列数据;条件随机字段;支持向量机;最大匹配;
入库时间 2022-08-21 10:15:01

相似文献

外文文献
中文文献
专利

1. Word Segmentation for Burmese Based on Dual-Layer CRFs [J] . Zhang Shaoning, Mao Cunli, Yu Zhengtao, ACM transactions on Asian language information processing . 2019,第1期

机译：基于双层CRF的缅甸语分词
2. Segmentation-free word spotting with exemplar SVMs [J] . Jon Almazán, Albert Gordo, Alicia Fornés, Pattern Recognition: The Journal of the Pattern Recognition Society . 2014,第12期

机译：使用示例SVM进行无分段的单词发现
3. Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node [J] . Nuo Qun, Hang Yan, Xi-Peng Qiu, 计算机科学技术学报（英文版） . 2020,第005期

机译：通过带有中继节点的BiLSTM + Semi-CRF进行中文分词
4. Vietnamese Word Segmentation with CRFs and SVMs: An Investigation [C] . Cam-Tu Nguyen, Trung-Kien Nguyen, Xuan-Hieu Phan, Pacific Asia Conference on Language, Information and Computation; 20061101-03; Wuhan(CN) . 2006

机译：使用CRF和SVM进行越南语分词：一项调查
5. Learning a two-stage SVM/CRF sequence classifier [D] . Hoefel, Guilherme 2008

机译：学习两阶段SVM / CRF序列分类器
6. Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing [O] . Fei Zhu, Bairong Shen 2009

机译：结合SVM-CRF用于最大双向压缩的生物命名实体识别
7. Vietnamese Word Segmentation with CRFs and SVMs: An Investigation [O] . Nguyen Cam-Tu, Nguyen Trung-Kien, Phan Xuan-Hieu, 2006

机译：使用CRF和SVM进行越南语分词：一项调查

Vietnamese Word Segmentation with CRFs and SVMs: An Investigation

摘要

著录项

相似文献

相关主题

期刊订阅