首页> 外文期刊>Engineering Applications of Artificial Intelligence >SANE 2.0: System for fine grained named entity typing on textual data
【24h】

SANE 2.0: System for fine grained named entity typing on textual data

机译:SANE 2.0:用于在文本数据上键入细粒度命名实体的系统

获取原文
获取原文并翻译 | 示例
           

摘要

Assignment of fine-grained types to named entities is gaining popularity as one of the major Information Extraction tasks due to its applications in several areas of Natural Language Processing. Existing systems use huge knowledge bases to improve the accuracy of the fine-grained types. We designed and developed SANE 2.0, which is an extended version of our earlier work SANE (Lal et al., 2017). It uses Wikipedia categories to fine grain the type of the named entities recognized in the textual data. The entities for which types could not be found using Wikipedia categories are typed using an intelligent information extraction method that uses search results of yahoo search engine. SANE uses an efficient algorithm to assign the fine-grained type to the entities extracted from the data. Wikipedia categorizes related topics under common headings. From these categories, we constructed a database that contains Wikipedia articles and their corresponding categories. SANE uses this database to predict the category types of named entities. We use Stanford NER to identify named entities with their coarse-grained types. For locations, we use Geonames data separately. We calculate the similarity between an entity and its categories using word2vec. Each entity is assigned to the category that has the highest similarity score with it. Finally, we map the category to the most appropriate WordNet (Miller et al., 1995) type. The main contribution of this work is building a named entity typing system without the use of knowledge bases. Through our experiments, 1) we establish the usefulness of Wikipedia categories to Named Entity Typing, 2) we present an intelligent method of using yahoo search results for Named Entity Typing and 3) we show that SANE's performance is on par with the state-of-the-art.
机译:由于其在自然语言处理的多个领域中的应用,将细粒度类型分配给命名实体作为一种主要的信息提取任务而变得越来越流行。现有系统使用庞大的知识库来提高细粒度类型的准确性。我们设计并开发了SANE 2.0,它是我们早期工作SANE的扩展版本(Lal等人,2017)。它使用Wikipedia类别来细化文本数据中识别的命名实体的类型。使用yahoo搜索引擎的搜索结果,使用智能信息提取方法对使用Wikipedia类别找不到类型的实体进行键入。 SANE使用高效的算法将细粒度类型分配给从数据中提取的实体。维基百科将相关主题归类在通用标题下。从这些类别中,我们构建了一个包含Wikipedia文章及其相应类别的数据库。 SANE使用此数据库来预测命名实体的类别类型。我们使用Stanford NER识别具有粗粒度类型的命名实体。对于位置,我们分别使用地名数据。我们使用word2vec计算实体及其类别之间的相似度。每个实体都被分配到与其具有最高相似性分数的类别。最后,我们将类别映射到最合适的WordNet类型(Miller等,1995)。这项工作的主要贡献是在不使用知识库的情况下构建命名实体类型系统。通过我们的实验,1)我们确定了Wikipedia类别对命名实体键入的有用性,2)我们提出了一种将yahoo搜索结果用于命名实体键入的智能方法,并且3)我们证明了SANE的性能与-艺术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号