首页> 外国专利> CHARACTER-BASED ATTRIBUTE VALUE EXTRACTION SYSTEM

CHARACTER-BASED ATTRIBUTE VALUE EXTRACTION SYSTEM

机译:基于字符的属性值提取系统

摘要

A system is provided that extracts attribute values. The system receives data including unstructured text from a data store. The system further tokenizes the unstructured text into tokens, where a token is a character of the unstructured text. The system further annotates the tokens with attribute labels, where an attribute label for a token is determined, in least in part, based on a word that the token originates from within the unstructured text. The system further groups the tokens into text segments based on the attribute labels, where a set of tokens that are annotated with an identical attribute label are grouped into a text segment, and where the text segments define attribute values. The system further stores the attribute labels and the attribute values within the data store.
机译:提供了一种提取属性值的系统。该系统从数据存储接收包括非结构化文本的数据。系统进一步将非结构化文本标记为令牌,其中令牌是非结构化文本的字符。该系统进一步用属性标签注释该令牌,其中至少部分地基于该令牌源自非结构化文本内的单词来确定该令牌的属性标签。该系统还基于属性标签将令牌分组为文本段,其中,将用相同属性标签注释的一组令牌分组为文本段,并且其中文本段定义属性值。系统还将属性标签和属性值存储在数据存储中。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号