首页> 外国专利> Word segmentation in chinese text

Word segmentation in chinese text

机译:中文文本分词

摘要

The present invention provides a facility for selecting from a sequence of natural language characters combinations of characters that may be words. The facility uses indications, for each of a plurality of characters, of (a) the characters that occur in the second position of words that begin with the character and (b) the positions in which the character occurs in words. For each of a plurality of contiguous combinations of characters occurring in the sequence, the facility determines whether the character occurring in the second position of the combination is indicated to occur in words that begin with the character occurring in the first position of the combination. If so, the facility determines whether every character of the combination is indicated to occur in words in a position in which it occurs in the combination. If so, the facility determines that the combination of characters may be a word. In some embodiments, the facility proceeds to compare the combination of characters to a list of valid words to determine whether the combination of characters is a word.
机译:本发明提供了一种从自然语言字符序列中选择可以是单词的字符组合的工具。该设施针对多个字符中的每一个使用以下指示:(a)在以该字符开头的单词的第二位置中出现的字符,以及(b)在单词中出现该字符的位置。对于在序列中出现的多个连续的字符组合中的每一个,设施确定是否指示出现在该组合的第二位置中的字符出现在以出现在该组合的第一位置中的字符开始的单词中。如果是这样,则设施确定是否指示组合中的每个字符以单词出现在组合中出现的位置。如果是这样,则设施确定字符的组合可以是单词。在一些实施例中,设施继续将字符的组合与有效单词的列表进行比较,以确定字符的组合是否是单词。

著录项

  • 公开/公告号US6640006B2

    专利类型

  • 公开/公告日2003-10-28

    原文格式PDF

  • 申请/专利权人 MICROSOFT CORPORATION;

    申请/专利号US19980087468

  • 发明设计人 ANDI WU;STEPHEN D. RICHARDSON;ZIXIN JIANG;

    申请日1998-05-29

  • 分类号G06K93/40;

  • 国家 US

  • 入库时间 2022-08-22 00:05:19

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号