首页> 外文会议>International Conference on Information Technology Interfaces >Domain dependence of statistical named entity recognition and classification in Croatian texts
【24h】

Domain dependence of statistical named entity recognition and classification in Croatian texts

机译:统计名为实体识别与克罗地亚文本分类的域依赖性

获取原文

摘要

Influence of text domain selection on statistical named entity recognition and classification in Croatian texts is investigated. Two datasets of Croatian newspaper texts of differing text domains were manually annotated for named entities and used for training and testing the Stanford NER system for named entity recognition based on sequence labeling with CRF. State of the art scores were observed in both domains. A strong preference for systems trained on mixed text domains is established by the experiment. The top-performing system was recorded with an overall F1-score of 0.876 on mixed-domain test sets, scoring 0.899 in one of the selected domains and 0.852 in the other. The single best domain F1-scores were recorded at 0.910 and 0.858.
机译:调查了文本域选择对克罗地亚文本统计名称实体识别和分类的影响。用于命名实体的两个克罗地亚报纸文本的两个数据集被手动注释,用于命名实体,用于基于CRF的序列标记的命名实体识别训练和测试STANFORD NER系统。在两个域中观察到最先进的评分。实验建立了对混合文本域训练的系统的强烈偏好。在混合域试验组上记录了顶级性能系统,总体F1分数为0.876,在其中一个选定的域中进行0.899,另一个域中的0.852。单个最佳域F1分数记录在0.910和0.858。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号