An improved embedding matching model for Chinese word segmentation

机译：一种改进的中文分词嵌入匹配模型

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

To date, various deep neural network based models have been extensively applied in Chinese word segmentation (CWS) task, however, some of these models either consume so much time to train, or perform weakly without additional dictionary or training corpus. In this paper, we proposed an improved embedding matching model for CWS, which achieved 0.4% and 0.5% improvement of F1-Measure respectively on PKU and MSR dataset compared to the original model. Also, our proposed model achieved 0.77% improvement of F1-Measure and 2.59% improvement of weighted F1-Measure on the NLPCC dataset. The improved model outperforms most of previous models on the F1 measure of segmentation performance, and consumes less time to train and test. After improvement, the model can be a better choice in practical use for its exceedingly fast convergence and excellent performance.

机译：迄今为止，各种基于深度神经网络的模型已广泛应用于中文分词（CWS）任务，但是，其中一些模型要么花费大量时间进行训练，要么在没有附加词典或训练语料库的情况下表现不佳。本文提出了一种改进的CWS嵌入匹配模型，与原始模型相比，在PKU和MSR数据集上F1-Measure的改进分别为0.4 \％和0.5 \％。同样，我们提出的模型在NLPCC数据集上实现了F1-Measure的0.77％的改进和加权F1-Measure的2.59％的改进。改进后的模型在细分效果的F1方面优于大多数以前的模型，并且花费更少的时间进行训练和测试。经过改进后，该模型具有极快的收敛速度和出色的性能，因此在实际使用中可能是更好的选择。

著录项

来源
《2018 International Conference on Artificial Intelligence and Big Data》|2018年|195-200|共6页
会议地点 Chengdu(CN)
作者
Xiaolong Deng; Yingfei Sun;
展开▼
作者单位

Sensor Network and Application Research Center, School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China;

Sensor Network and Application Research Center, School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Hidden Markov models; Feature extraction; Viterbi algorithm; Task analysis; Decoding; Training; Neural networks;

机译：隐马尔可夫模型;特征提取;维特比算法;任务分析;解码;训练;神经网络;;

相似文献

外文文献
中文文献
专利

1. Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages [J] . Chimalamarri Santwana, Sitaram Dinkar, Jain Ashritha ACM transactions on Asian language information processing . 2020,第5期

机译：改善低资源语言的跨性词嵌入的形态分割
2. Improving the Bag-of-Words model with Spatial Pyramid matching using data augmentation for fine-grained arbitrary-oriented ship classification [J] . Viet Hung Luu, Van Kiet Dinh, Nguyen Hoang Hoa Luong, Remote sensing letters . 2019,第7a9期

机译：使用数据金字塔对空间定向的金字塔进行改进，以实现细粒度的任意方向的船舶分类
3. Improving the Bag-of-Words model with Spatial Pyramid matching using data augmentation for fine-grained arbitrary-oriented ship classification [J] . Viet Hung Luu, Van Kiet Dinh, Nguyen Hoang Hoa Luong, Remote sensing letters . 2019,第7a9期

机译：利用空间金字塔匹配使用数据增强进行精细化任意船舶分类的空间金字塔匹配改进袋式模型
4. An improved embedding matching model for Chinese word segmentation [C] . Xiaolong Deng, Yingfei Sun International Conference on Artificial Intelligence and Big Data . 2018

机译：汉字分段改进的嵌入匹配模型
5. Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding [D] . Ju, Yiting. 2017

机译：事物和字符串和更多：通过组合实体共同发生，主题建模和单词嵌入来改善从短文本的歧义
6. Speculation Detection for Chinese Clinical Notes: Impacts of Word Segmentation and Embedding Models [O] . Shaodian Zhang, Tian Kang, Xingting Zhang, -1

机译：中医临床笔记的推测检测：分词和嵌入模型的影响
7. Chinese Word Segmentation with Minimal Linguistic Knowledge: An Improved Conditional Random Fields Coupled with Character Clustering and Automatically Discovered Template Matching [O] . Richard Tzong-han Tsai, Hong-jie Dai, Hsieh-chuan Hung, 2013

机译：具有最少语言知识的中文分词：结合字符聚类和自动发现的模板匹配的改进条件随机字段

An improved embedding matching model for Chinese word segmentation

摘要

著录项

相似文献

相关主题

期刊订阅