首页> 外国专利> LEMMATIZING, STEMMING, AND QUERY EXPANSION METHOD AND SYSTEM

LEMMATIZING, STEMMING, AND QUERY EXPANSION METHOD AND SYSTEM

机译:取整,提取和查询扩展方法和系统

摘要

A method of stemming text and system therefore are described. The method comprises removing stop words from a document based on at least one stop word entry in an array of stop words and flagging as nouns words determined to be attached to definite articles and preceded by a noun array entry in an array of stop words preceding at least one noun; adding flagged nouns to a noun dictionary; flagging as verbs words determined to be preceded by an verb array entry in an array of stop words preceding at least one verb; adding flagged verbs to a verb dictionary; searching the document for nouns and verbs based on the flagged nouns and the flagged verbs; removing remaining stop words subsequent to searching the document; applying light stemming on the flagged nouns; applying a root-based stemming on the flagged verbs; and storing the stemmed document.
机译:因此,描述了一种阻止文本的方法和系统。该方法包括:基于停用词阵列中的至少一个停用词条目,从文档中删除停用词;并将确定为附加到定冠词并在前面的停用词阵列中的名词数组条目标记为名词词至少一个名词;将标记的名词添加到名词词典中;在至少一个动词之前的停用词数组中,将被确定为在动词数组条目之前的单词标记为动词;将标记的动词添加到动词字典中;根据标记的名词和标记的动词在文档中搜索名词和动词;在搜索文档之后删除其余的停用词;对标记的名词施加轻微的词干;对标记的动词应用基于词根的词干;并存储提取的文档。

著录项

  • 公开/公告号US2010082333A1

    专利类型

  • 公开/公告日2010-04-01

    原文格式PDF

  • 申请/专利权人 EIMAN TAMAH AL-SHAMMARI;

    申请/专利号US20090476238

  • 发明设计人 EIMAN TAMAH AL-SHAMMARI;

    申请日2009-06-01

  • 分类号G06F17/21;

  • 国家 US

  • 入库时间 2022-08-21 18:52:56

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号