On the Construction of Web NER Model Training Tool based on Distant Supervision

Chou Chien-Lung; Chang Chia-Hui; Lin Yuan-Hao; Chien Kuo-Chun

首页> 外文期刊>ACM transactions on Asian language information processing >On the Construction of Web NER Model Training Tool based on Distant Supervision

【24h】

On the Construction of Web NER Model Training Tool based on Distant Supervision

机译：基于遥远监督的Web Ner模型培训工具建设

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named entity recognition (NER) is an important task in natural language understanding, as it extracts the key entities (person, organization, location, date, number, etc.) and objects (product, song, movie, activity name, etc.) mentioned in texts. However, existing natural language processing (NLP) tools (such as Stanford NER) recognize only general named entities or require annotated training examples and feature engineering for supervised model construction. Since not all languages or entities have public NER support, constructing a tool for NER model training is essential for low-resource language or entity information extraction. In this article, we study the problem of developing a tool to prepare training corpus from the Web with known seed entities for custom NER model training via distant supervision. The major challenge of automatic labeling lies in the long labeling time due to large corpus and seed entities as well as the concern to avoid false positive and false negative examples due to short and long seeds. To solve this problem, we adopt locality-sensitive hashing (LSH) for various length of seed entities. We conduct experiments on five types of entity recognition tasks, including Chinese person names, food names, locations, points of interest (POIs), and activity names to demonstrate the improvements with the proposed Web NER model construction tool. Because the training corpus is obtained by automatic labeling of the seed entity-related sentences, one could use either the entire corpus or the positive only sentences for model training. Based on the experimental results, we found the decision should depend on whether traditional linear chained conditional random fields (CRF) or deep neural network-based CRF is used for model training as well as the completeness of the provided seed list.

机译：命名实体识别（ner）是自然语言理解的重要任务，因为它提取关键实体（人，组织，位置，日期，数字等）和对象（产品，歌曲，电影，活动名称等）在文本中提到。然而，现有的自然语言处理（NLP）工具（如斯坦福网）仅识别一般的命名实体，或者需要注释的训练示例和用于监督模型建设的特征工程。由于并非所有语言或实体都有公共网页支持，因此构建用于NER模型培训的工具对于低资源语言或实体信息提取至关重要。在本文中，我们研究了开发工具的问题，以通过遥控监督，通过已知的种子实体从网络中培训培训语料库。自动标签的主要挑战在于由于大型语料库和种子实体，避免由于短而长的种子而避免假阳性和假阴性示例的关注。为了解决这个问题，我们采用各种种子实体长度的地方敏感散列（LSH）。我们对五种类型的实体识别任务进行实验，包括中国人名，食品名称，地点，兴趣点（POI）和活动名称，以展示所提出的网页模型建设工具的改进。由于培训语料库是通过自动标记种子实体相关句子获得的，所以可以使用整个语料库或肯定的模型训练。基于实验结果，我们发现该决定应取决于传统的线性链式条件随机场（CRF）或基于深神经网络的CRF用于模型培训以及提供的种子列表的完整性。

著录项

来源
《ACM transactions on Asian language information processing》 |2020年第6期|87.1-87.28|共28页
作者
Chou Chien-Lung; Chang Chia-Hui; Lin Yuan-Hao; Chien Kuo-Chun;
展开▼
作者单位

Natl Cent Univ 300 Zhongda Rd Taoyuan Taiwan;

Natl Cent Univ 300 Zhongda Rd Taoyuan Taiwan;

Natl Cent Univ 300 Zhongda Rd Taoyuan Taiwan;

Natl Cent Univ 300 Zhongda Rd Taoyuan Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Information extraction; named entity recognition; distant supervision; locality-sensitive hashing (LSH); scalable automatic labeling;

机译：信息提取;命名实体识别;遥远的监督;地区敏感散列（LSH）;可扩展的自动标签;
入库时间 2022-08-18 23:27:11

相似文献

外文文献
中文文献
专利

1. Medical Training and Web Based E-Learning: Comparative Study of Medical Students? Training by Distant Learning System [J] . Kazem Ashjaei, Alireza Rahimi e Mamagani, Nazila Tajaddini, Research Journal of Biological Sciences . 2008,第10期

机译：医学培训和基于Web的电子学习：医学生的比较研究？远程学习系统培训
2. Effects of web-based supervisor training on supervisor support and psychological distress among workers: a randomized controlled trial. [J] . Kawakami N, Kobayashi Y, Takao S, Preventive Medicine: An International Journal Devoted to Practice and Theory . 2005,第2期

机译：基于网络的主管培训对上级支持和工人心理困扰的影响：一项随机对照试验。
3. Usefulness of a virtual community of practice and Web 2.0 tools for general practice training: Experiences and expectations of general practitioner registrars and supervisors [J] . BarnettS., JonesS.C., BennettS., Australian journal of primary health . 2013,第4期

机译：虚拟实践社区和Web 2.0工具对常规培训的有用性：普通科医生注册服务商和主管的经验和期望
4. Mining features for web ner model construction based on distant learning [C] . Chien-Lung Chou, Chia-Hui Chang International conference on Asian language processing . 2017

机译：基于远程学习的Web ner模型构建的挖掘功能
5. WebStem: Supervision Tool to Improve Unsupervised Landmark Based Registration of Brainstem Sections [D] . Izhaki, Idan. 2015

机译：WebStem：监督工具，用于改进基于无监督地标的脑干部分的注册
6. An online web-based assessment tool to monitor graduate medical trainee professionalism and supervision [O] . Manuel C. Vallejo, Ahmed F. Attaallah, Linda S. Nield, 2018

机译：一个基于网络的在线评估工具用于监测研究生医学实习生的专业水平和监督
7. Topic Modelling vs Distant Supervision: A Comparative Evaluation Based on the Classification of Parliamentary Enquiries [O] . Riza Batista-Navarro, Oliver Hawkins 2019

机译：主题建模与远程监督：基于议会查询分类的比较评估

On the Construction of Web NER Model Training Tool based on Distant Supervision

摘要

著录项

相似文献

相关主题

期刊订阅