首页> 外国专利> TRANSDUCTIVE APPROACH TO CATEGORY-SPECIFIC RECORD ATTRIBUTE EXTRACTION

TRANSDUCTIVE APPROACH TO CATEGORY-SPECIFIC RECORD ATTRIBUTE EXTRACTION

机译:特定类别记录属性提取的转移方法

摘要

Disclosed are methods and apparatus for segmenting and labeling a collection of token sequences. A plurality of segments of one or more tokens in a token sequence collection are partially labeled with labels from a set of target labels using high precision domain-specific labelers so as to generate a partially labeled sequence collection having a plurality of labeled segments and a plurality of unlabeled segments. Any label conflicts in the partially labeled sequence collection are resolved. One or more of the labeled segments of the partially labeled sequence collection are expanded so as to cover one or more additional tokens of the partially labeled sequence collection. A statistical model, for labeling segments using local token and segment features of the sequence collection, is trained based on the partially labeled sequence collection. This trained model is then used to label the unlabeled segments and the labeled segments of the sequence collection so as to generate a labeled sequence collection. The labeled sequence collection is then stored as structured output records in a database.
机译:公开了用于分割和标记令牌序列的集合的方法和设备。使用高精度域特定标记器,使用来自一组目标标记的标记,对标记序列集合中一个或多个标记的多个片段进行部分标记,以生成具有多个标记片段和多个标记片段的部分标记序列集合未标记的细分。解决了部分标记的序列集合中的任何标签冲突。扩展部分标记的序列集合的一个或多个标记片段,以覆盖部分标记的序列集合的一个或多个其他标记。基于部分标记的序列集合训练用于使用局部标记和序列集合的片段特征标记片段的统计模型。然后,该训练模型用于标记序列集合的未标记片段和标记片段,从而生成标记序列集合。然后将标记的序列集合作为结构化输出记录存储在数据库中。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号