首页> 外文会议>International Conference on Tools with Artificial Intelligence >Semi-automatic Dictionary Curation for Domain-specific Ontologies
【24h】

Semi-automatic Dictionary Curation for Domain-specific Ontologies

机译:域特定于域的本体的半自动词典策策

获取原文

摘要

Within the broad area of information extraction, we study the problem of effective dictionary curation in an enterprise setting. Equipped with an ontology, representative of the domain of an enterprise, our approach populates the attributes of leaf nodes of the ontology with instances extracted from the enterprise corpus. For an attribute of interest, given a few seed examples or indicative features for the attribute, we first obtain a ranked list of 'list pages' potentially containing additional dictionary terms. Our ranking model ranks pages from the enterprise corpus based on their 'list' content using several visual and lexical features. We gather users' judgement of the result pages and the model continuously learns from this feedback. We compare different techniques of dictionary curation using rule based extractors and visual features of pages. Based on rule writing exercise, we show the benefit of dictionaries for leaf node attributes, in writing rule based extractors for higher level nodes in an ontology. We have implemented a dictionary curation system based on these ideas. Experimental analysis using academic domain ontology and universities corpora, reveal (in the context of enterprise analytics) (i) the merit of dictionary support in rule based information extraction (ii) the viability and effectiveness of an interactive approach for dictionary creation.
机译:在广泛的信息提取范围内,我们研究了企业环境中有效词典策划问题。我们的方法配备了代表企业域的Ontology,我们的方法填充了本体论的叶节点的属性与企业语料库中提取的实例。对于感兴趣的属性,给定少数种子示例或属性的指示特征,我们首先获得排名的“列表页”列表,可能包含其他含义术语。我们的排名模型根据使用多个视觉和词汇特征,根据其“列表”内容等级排名来自企业语料库的页面。我们收集用户对结果页面的判断,并且模型不断从此反馈中学习。我们使用基于规则的提取器和页面的可视特征比较不同的字典策策技术。基于规则写入练习,我们展示了叶节点属性的字典的好处,在本体中的基于规则的提取器中为更高级别节点的提取器。我们已经基于这些想法实现了一条字典策策系统。使用学术领域本体论和大学的实验分析,揭示(在企业分析的背景下)(i)规则的信息提取中文的优点(ii)词典创作的互动方法的可行性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号