首页> 外文会议>World Multi-Conference on Systemics, Cybernetics and Informatics >Utilizing Genetic Algorithm and Online Resources for Vietnamese Text Categorization
【24h】

Utilizing Genetic Algorithm and Online Resources for Vietnamese Text Categorization

机译:利用遗传算法和越南文本分类的在线资源

获取原文

摘要

The fundamental difference between English and Vietnamese, in which a sentence in Vietnamese may not have a unique segmentation with a unique meaning, has created great challenges for research in Vietnamese text categorization. Most of the related works for English and Western languages are not applicable for text categorization in Vietnamese due to this fundamental difference. The lack of training data also imposes many constraints on our capacity in exploiting different methods. There have been similar research efforts in Chinese and Japanese but most require large sets of training data which are particularly hard to find or compose in Vietnamese. This paper presents IGATEC (Internet and Genetics Algorithm-based Text Categorization), a novel text categorization approach that requires no training data or dictionary. The possibility of different text segmentations with different meaning is also taken in to account. This novel approach is a combination of Genetics Algorithm with statistical data extracted from the Internet. These data are easily extracted by any available internet search engine, and are stored for off-line used in both IGATEC's GA sentence segmentation engine and text classifier for speeding up. We achieved a notable accuracy of 97.1% in classifying news collected from Vietnamese online news with a short running time.
机译:英语和越南语之间的基本差异,其中越南语的句子可能没有具有独特意义的独特细分,为越南文本分类的研究产生了巨大挑战。由于这种根本差异,大多数相关工程不适用于越南语的文本分类。缺乏培训数据也对我们利用不同方法的能力施加了许多限制。中文和日本人已经有类似的研究努力,但大多数需要大量的培训数据,这些数据特别难以在越南语中找到或撰写。本文介绍了IGATEC(基于互联网和基于遗传算法的文本分类),这是一种不需要培训数据或字典的新型文本分类方法。还采用了不同含义的不同文本分段的可能性。这种新方法是从因特网提取的统计数据的遗传算法的组合。这些数据由任何可用的Internet搜索引擎轻松提取,并存储在IGATEC的GA句子分段引擎和文本分类器中使用的离线,以便加速。我们在从越南在线新闻中收集的分类新闻中获得了97.1%的显着准确性,短暂的运行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号