【24h】

Using term informativeness for named entity detection

机译:使用术语信息性进行命名实体检测

获取原文

摘要

Informal communication (e-mail, bulletin boards) poses a difficult learning environment because traditional grammatical and lexical information are noisy. Other information is necessary for tasks such as named entity detection. How topic-centric, or informative, a word is can be valuable information. It is well known that informative words are best modeled by "heavy-tailed" distributions, such as mixture models. However, informativeness scores do not take full advantage of this fact. We introduce a new informativeness score that directly utilizes mixture model likelihood to identify informative words. We use the task of extracting restaurant names from bulletin board posts as a way to determine effectiveness. We find that our "mixture score" is weakly effective alone and highly effective when combined with Inverse Document Frequency. We compare against other informativeness criteria and find that only Residual IDF is competitive against our combined IDF/Mixture score.
机译:非正式交流(电子邮件,公告板)构成了一个困难的学习环境,因为传统的语法和词汇信息比较嘈杂。对于诸如命名实体检测之类的任务,其他信息也是必需的。单词以主题为中心或信息丰富的方式可以成为有价值的信息。众所周知,最好用“重尾”分布(例如混合模型)来建模翔实的单词。但是,信息性分数并未充分利用这一事实。我们引入了一个新的信息性得分,该得分直接利用混合模型的可能性来识别信息性单词。我们使用从公告栏帖子中提取餐厅名称的任务来确定有效性。我们发现,我们的“混合分数”单独效果较弱,而与反向文档频率结合使用时则非常有效。我们将其与其他信息标准进行了比较,发现只有残差IDF在我们的IDF /混合物综合得分中具有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号