首页> 外国专利> Character-based attribute value extraction system

Character-based attribute value extraction system

机译:基于字符的属性值提取系统

摘要

A system is provided that extracts attribute values. The system receives data including unstructured text from a data store. The system further tokenizes the unstructured text into tokens, where a token is a character of the unstructured text. The system further annotates the tokens with attribute labels, where an attribute label for a token is determined, in least in part, based on a word that the token originates from within the unstructured text. The system further groups the tokens into text segments based on the attribute labels, where a set of tokens that are annotated with an identical attribute label are grouped into a text segment, and where the text segments define attribute values. The system further stores the attribute labels and the attribute values within the data store.
机译:提供了一个系统提取属性值的系统。系统从数据存储接收包括非结构化文本的数据。该系统还将非结构化文本授予到令牌中,其中令牌是非结构化文本的特征。该系统进一步向具有属性标签注释的令牌,其中令牌的属性标签至少部分地基于令牌源自非结构化文本中的单词来确定。该系统还基于属性标签将令牌分组到文本段中,其中用相同属性标签注释的一组令牌被分组为文本段,以及文本段定义属性值的位置。系统还将属性标签和数据存储中的属性值存储。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号