How to make the most of NE dictionaries in statistical NER

Yutaka Sasaki; Yoshimasa Tsuruoka; John McNaught; Sophia Ananiadou

首页> 外文期刊>BMC Bioinformatics >How to make the most of NE dictionaries in statistical NER

【24h】

How to make the most of NE dictionaries in statistical NER

机译：如何在统计NER中充分利用NE词典

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background When term ambiguity and variability are very high, dictionary-based Named Entity Recognition ( NER ) is not an ideal solution even though large-scale terminological resources are available. Many researches on statistical NER have tried to cope with these problems. However, it is not straightforward how to exploit existing and additional Named Entity ( NE ) dictionaries in statistical NER. Presumably, addition of NEs to an NE dictionary leads to better performance. However, in reality, the retraining of NER models is required to achieve this. We chose protein name recognition as a case study because it most suffers the problems related to heavy term variation and ambiguity. Methods We have established a novel way to improve the NER performance by adding NEs to an NE dictionary without retraining. In our approach, first, known NEs are identified in parallel with Part-of-Speech ( POS ) tagging based on a general word dictionary and an NE dictionary. Then, statistical NER is trained on the POS/PROTEIN tagger outputs with correct NE labels attached. Results We evaluated performance of our NER on the standard JNLPBA-2004 data set. The F-score on the test set has been improved from 73.14 to 73.78 after adding protein names appearing in the training data to the POS tagger dictionary without any model retraining. The performance further increased to 78.72 after enriching the tagging dictionary with test set protein names. Conclusion Our approach has demonstrated high performance in protein name recognition, which indicates how to make the most of known NEs in statistical NER.

机译：背景技术当术语的歧义性和可变性很高时，即使有大量的术语资源，基于字典的命名实体识别（NER）也不是理想的解决方案。关于统计NER的许多研究都试图解决这些问题。但是，如何利用统计NER中的现有和其他命名实体（NE）字典并非易事。据推测，将NE添加到NE字典可以带来更好的性能。但是，实际上，需要重新训练NER模型才能实现此目的。我们选择蛋白质名称识别作为案例研究，因为它最容易遭受与长期变化和歧义有关的问题。方法我们建立了一种通过将NE添加到NE字典而无需重新训练来提高NER性能的新颖方法。在我们的方法中，首先，基于通用词词典和NE词典，与词性（POS）标记并行识别已知的NE。然后，在带有正确NE标签的POS / PROTEIN标签输出上训练统计NER。结果我们根据标准JNLPBA-2004数据集评估了NER的性能。将训练数据中出现的蛋白质名称添加到POS标签词典中之后，无需进行任何模型再训练，测试集上的F评分已从73.14提高到73.78。使用测试集蛋白质名称丰富标签字典后，性能进一步提高到78.72。结论我们的方法已经证明了在蛋白质名称识别方面的高性能，这表明如何在统计NER中充分利用已知的NE。

著录项

来源
《BMC Bioinformatics》 |2008年第11期|共页
作者
Yutaka Sasaki; Yoshimasa Tsuruoka; John McNaught; Sophia Ananiadou;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. How to make the most of NE dictionaries in statistical NER [J] . Yutaka Sasaki, Yoshimasa Tsuruoka, John McNaught, BMC Bioinformatics . 2008,第SUPPLEMENTa11期

机译：如何在统计NER中充分利用NE词典
2. Zum dreistufigen Bezeichnungsschema zur Beurteilung von Döner Kebap und dönerähnlichen Erzeugnissen: Kriterien zum Beurteilungsmerkmal „Grad der Leitsatzkonformität“ als Maßstab zur Abgrenzung der Produktgruppen „ALIUD“ und „nach Döner Kebap Art“ [J] . C. Dildei, und H. Kirchhoff Journal für Verbraucherschutz und Lebensmittelsicherheit . 2007,第4期

机译：关于评估烤肉串和类似捐赠者产品的三阶段指定方案：评估标准“符合指导原则的程度”的标准，作为划分“ ALIUD”和“烤肉串类型”产品组的标准
3. SCHÖNER ESSEN, SCHÖNER TRINKEN, SCHÖNER SEIN? [J] . TextilMitteilungen . 2006,第7期

机译：吃得更好，喝得更好，会美丽吗？
4. Smart Dictionary 実⽤化に向けた教師データ量とNER 精度評価について [C] . 照屋絵理, 愛甲和秀, 竹内理電子情報通信学会;情報科学技術フォーラム . 2019

机译：关于智能词典实际使用中的教师数据量和NER准确性评估
5. The use of pocket electronic dictionaries as compared with printed dictionaries by Japanese learners of English. [D] . Kobayashi, Chiho. 2006

机译：与日本英语学习者相比，使用袖珍电子词典与印刷词典的比较。
6. How to make the most of NE dictionaries in statistical NER [O] . Yutaka Sasaki, Yoshimasa Tsuruoka, John McNaught, 2008

机译：如何在统计NER中充分利用NE词典
7. How to make the most of NE dictionaries in statistical NER [O] . 2008

机译：如何在统计NER中充分利用NE词典

How to make the most of NE dictionaries in statistical NER

摘要

著录项

相似文献

相关主题

期刊订阅