首页> 外文会议>Conference on Applied Mathematics >IMPROVEMENT OF CHARACTER SET DETECTOR CHARDET
【24h】

IMPROVEMENT OF CHARACTER SET DETECTOR CHARDET

机译:改进字符集检测器chardet

获取原文

摘要

There are many encoding schemes which represent characters in text files. If the program displaying these texts cannot determine the right encoding, the text may become unreadable. Thanks to the widely spread universal charset detector from Netscape, it is possible to display text correctly in any software on any device. Language models for the automatic character set detection have been created only for a small group of languages. Our aim was to create language models for more countries so that the probability of successful determination of the encoding increased. The most problematic was the increase in accuracy of detecting the character set for languages using ISO-8859-1 encoding. The original algorithm was not sufficiently precise, and we have therefore designed a different procedure.
机译:有许多编码方案表示文本文件中的字符。如果显示这些文本的程序无法确定正确的编码,则文本可能变得不可读。由于来自Netscape的广泛扩展的通用扫描探测器,可以在任何设备上的任何软件中正确显示文本。仅针对一小组语言创建了自动字符集检测的语言模型。我们的目标是为更多国家创建语言模型,以便成功确定编码的可能性增加。最有问题的是使用ISO-8859-1编码检测语言的字符集的准确性提高。原始算法不充分精确,因此我们设计了不同的过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号