...
首页> 外文期刊>Computers and the Humanities >A web-based Bengali news corpus for named entity recognition
【24h】

A web-based Bengali news corpus for named entity recognition

机译:基于网络的孟加拉新闻语料库,用于命名实体识别

获取原文
获取原文并翻译 | 示例

摘要

The rapid development of language resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. A tagged Bengali news corpus has been developed from the web archive of a widely read Bengali newspaper. A web crawler retrieves the web pages in Hyper Text Markup Language (HTML) format from the news archive. At present, the corpus contains approximately 34 million wordforms. Named Entity Recognition (NER) systems based on pattern based shallow parsing with or without using linguistic knowledge have been developed using a part of this corpus. The NER system that uses linguistic knowledge has performed better yielding highest F-Score values of 75.40%, 72.30%, 71.37%, and 70.13% for person, location, organization, and miscellaneous names, respectively.
机译:使用机器学习技术来减少计算机化语言的语言资源和工具的快速发展需要适当标记的语料库。从孟加拉国广泛阅读的报纸的网络档案库中开发出了带有标签的孟加拉语新闻语料库。 Web搜寻器从新闻档案中检索超文本标记语言(HTML)格式的网页。目前,该语料库包含大约3400万种字形。使用该语料库的一部分,已经开发了基于模式的浅析浅析的命名实体识别(NER)系统,无论是否使用语言知识,这些系统都可以使用。使用语言知识的NER系统的人,位置,组织和其他名称的F分数最高,分别为75.40%,72.30%,71.37%和70.13%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号