A Unicode based Adaptive Segmentor

机译：基于Unicode的自适应分段器

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a Unicode basedrnChinese word segmentor. It can handlernChinese text in Simplified, Traditional, orrnmixed mode. The system uses the strategyrnof divide-and-conquer to handle thernrecognition of personal names, numbers,rntime and numerical values, etc in the preprocessingrnstage. The segmentor furtherrnuses tagging information to work onrndisambiguation. Adopting a modularrndesign approach, different functional partsrnare separately implemented usingrndifferent modules and each modulerntackles one problem at a time providingrnmore flexibility and extensibility. Resultsrnshow that with added pre-processingrnmodules and accessorial modules, thernaccuracy of the segmentor is increasedrnand the system is easily adaptive torndifferent applications.

机译：本文提出了一种基于Unicode的中文分词器。它可以以简体，繁体或混合模式处理中文文本。该系统使用策略分治策略在预处理阶段处理个人姓名，数字，时间和数值等的识别。分割器进一步使用标记信息来消除歧义。采用模块化设计方法，使用不同的模块分别实现不同的功能部件，并且每个模块一次解决一个问题，从而提供更大的灵活性和可扩展性。结果表明，通过添加预处理模块和辅助模块，分割器的准确性得以提高，并且该系统易于适应不同的应用。

著录项

来源
《41st annual meeting of the Association for Computational Linguistics : Proceedings of the conference》|2003年|1-4|共4页
会议地点 Sapporo(JP);Sapporo(JP);Sapporo(JP)
作者
Q. Lu; S. T. Chan; R. F. Xu; T. S. Chiu; B. L. Li; S. W. Yu;
展开▼
作者单位

Dept. Of Computing,rnThe Hong Kong Polytechnic University,rnHung Hom, Hong Kongrncsluqin@comp.polyu.edu.hk;

Dept. Of Computing,rnThe Hong Kong Polytechnic University,rnHung Hom, Hong Kongrn@comp.polyu.edu.hk;

Dept. Of Computing,rnThe Hong Kong Polytechnic University,rnHung Hom, Hong Kongrncsrfxu@comp.polyu.edu.hk;

Dept. Of Computing,rnThe Hong Kong Polytechnic University,rnHung Hom, Hong Kongrncsluqin,csrfxu@comp.polyu.edu.hk;

The Institute of Computational Linguistics,rnPeking University,rnBeijing, Chinarnyusw@pku.edu.cn;

The Institute of Computational Linguistics,rnPeking University,rnBeijing, Chinarnlibi@pku.edu.cn;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. An Improved Version of Texture-based Foreground Segmentation: Block-based Adaptive Segmenter [J] . Kahlil Muchtar, Faris Rahman, Tjeng Wawan Cenggoro, Procedia Computer Science . 2018,第1期

机译：基于纹理的前景分割的改进版本：基于块的自适应分割器
2. Functionality‑Improved Arabic Text Steganography Based on Unicode Features [J] . Norah Alanazi, Esam Khan, Adnan Gutub Arabian Journal for Science and Engineering. Section A, Sciences . 2020,第12期

机译：基于Unicode功能的功能 - 改进的阿拉伯语文本隐写术
3. Unicode-8 based linguistics data set of annotated Sindhi text [J] . Mazhar Ali Dootio, Asim Imdad Wagan Data in Brief . 2018,第2期

机译：基于Unicode-8的语言学数据集注释的Sindhi文本
4. A Unicode based Adaptive Segmentor [C] . Q. Lu, S. T. Chan, R. F. Xu, 41st annual meeting of the Association for Computational Linguistics : Proceedings of the conference . 2003

机译：基于Unicode的自适应分段器
5. Three Segment Adaptive Power Electronic Compensator for Non-Periodic Currents [D] . Ghaderi, Amin. 2017

机译：非周期电流的三段式自适应功率电子补偿器
6. Unicode-8 based linguistics data set of annotated Sindhi text [O] . Mazhar Ali Dootio, Asim Imdad Wagan 2018

机译：带注释的信德文本的基于Unicode-8的语言学数据集
7. A Unicode based Adaptive Segmentor [O] . Q. Lu, S. T. Chan, R. F. Xu, 2008

机译：基于Unicode的自适应分段器

A Unicode based Adaptive Segmentor

摘要

著录项

相似文献

相关主题

期刊订阅