首页> 外文会议>IEEE International Conference on Machine Learning and Applications >Putting Self-Supervised Token Embedding on the Tables
【24h】

Putting Self-Supervised Token Embedding on the Tables

机译:将自监管令牌嵌入表中

获取原文

摘要

Information distribution by electronic messages is a privileged means of transmission for many businesses and individuals, often under the form of plain-text tables. As their number grows, it becomes necessary to use an algorithm to extract text and numbers instead of a human. Usual methods are focused on regular expressions or on a strict structure in the data, but are not efficient when we have many variations, fuzzy structure or implicit labels. In this paper we introduce SC2T, a totally self-supervised model for constructing vector representations of tokens in semi-structured messages by using characters and context levels that address these issues. It can then be used for an unsupervised labeling of tokens, or be the basis for a semi-supervised information extraction system.
机译:通过电子消息进行信息分发是许多企业和个人的一种特权传输方式,通常采用纯文本表的形式。随着数字的增长,有必要使用一种算法来提取文本和数字而不是人类。通常的方法侧重于正则表达式或数据中的严格结构,但是当我们有很多变化,模糊结构或隐式标签时,效率不高。在本文中,我们介绍了SC2T,它是一种完全自我监督的模型,用于通过使用字符和上下文级别来解决这些问题来构造半结构化消息中令牌的矢量表示。然后可以将其用于令牌的无监督标记,或作为半监督信息提取系统的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号