首页> 外文会议>IEEE International Conference on Machine Learning and Applications >Putting Self-Supervised Token Embedding on the Tables
【24h】

Putting Self-Supervised Token Embedding on the Tables

机译:将自我监督的令牌放在桌子上

获取原文

摘要

Information distribution by electronic messages is a privileged means of transmission for many businesses and individuals, often under the form of plain-text tables. As their number grows, it becomes necessary to use an algorithm to extract text and numbers instead of a human. Usual methods are focused on regular expressions or on a strict structure in the data, but are not efficient when we have many variations, fuzzy structure or implicit labels. In this paper we introduce SC2T, a totally self-supervised model for constructing vector representations of tokens in semi-structured messages by using characters and context levels that address these issues. It can then be used for an unsupervised labeling of tokens, or be the basis for a semi-supervised information extraction system.
机译:电子邮件的信息分布是许多企业和个人的特权传输手段,通常是纯文本表的形式。随着它们的数量的增长,有必要使用算法提取文本和数字而不是人类。通常的方法专注于正则表达式或在数据中的严格结构上,但是当我们有许多变化,模糊结构或隐式标签时都不有效。在本文中,我们介绍SC2T,通过使用解决这些问题的字符和上下文级别来构建半结构化消息中令牌的矢量表示的全自动监督模型。然后,它可以用于令牌的无监督标记,或者是半监督信息提取系统的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号