首页> 外文会议>Workshop on vector space Modeling for Natural Language Processing >Word Embeddings vs Word Types for Sequence Labeling: the Curious Case of CV Parsing
【24h】

Word Embeddings vs Word Types for Sequence Labeling: the Curious Case of CV Parsing

机译:Word Embeddings与顺序标签的Word类型:CV解析的奇怪情况

获取原文
获取外文期刊封面目录资料

摘要

We explore new methods of improving Curriculum Vitae (CV) parsing for German documents by applying recent research on the application of word embeddings in Natural Language Processing (NLP). Our approach integrates the word embeddings as input features for a probabilistic sequence labeling model that relies on the Conditional Random Field (CRF) framework. Best-performing word embeddings are generated from a large sample of German CVs. The best results on the extraction task are obtained by the model which integrates the word embeddings together with a number of hand-crafted features. The improvements are consistent throughout different sections of the target documents. The effect of the word embeddings is strongest on semi-structured, out-of-sample data.
机译:我们通过应用最近关于自然语言处理中的Word Embedings的应用程序来改善德国文档的课程(CV)解析的新方法(NLP)。我们的方法将eMbeddings作为输入特征集成为概率依赖于条件随机字段(CRF)框架的概率序列标记模型的输入特征。从德国CV的大量样本生成最佳性能的单词嵌入式。提取任务的最佳结果是通过集成单词嵌入式的模型以及多个手工制作的功能来获得。在目标文件的不同部分中,改进是一致的。嵌入式的效果在半结构化外的数据上最强。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号