...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Facilitating Document Annotation Using Content and Querying Value
【24h】

Facilitating Document Annotation Using Content and Querying Value

机译:使用内容和查询值方便文档批注

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

A large number of organizations today generate and share textual descriptions of their products, services, and actions. Such collections of textual data contain significant amount of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier for humans (and/or algorithms) to identify the metadata when such information actually exists in the document, instead of naively prompting users to fill in forms with information that is not available in the document. As a major contribution of this paper, we present algorithms that identify structured attributes that are likely to appear within the document, by jointly utilizing the content of the text and the query workload. Our experimental evaluation shows that our approach generates superior results compared to approaches that rely only on the textual content or only on the query workload, to identify attributes of interest.
机译:如今,许多组织都生成并共享其产品,服务和操作的文本描述。此类文本数据集合包含大量结构化信息,这些信息仍掩埋在非结构化文本中。尽管信息提取算法有助于结构化关系的提取,但是它们通常是昂贵且不准确的,尤其是在不包含目标结构化信息的任何实例的文本之上进行操作时。我们提出了一种新颖的替代方法,该方法通过识别可能包含感兴趣信息的文档来促进结构化元数据的生成,并且此信息随后将对查询数据库有用。我们的方法依赖于这样的想法,即如果界面提示,人类更有可能在创建期间添加必要的元数据。或当此类信息实际存在于文档中时,人类(和/或算法)更容易识别元数据,而不是天真地提示用户使用文档中不可用的信息来填写表格。作为本文的主要贡献,我们提出了通过联合利用文本内容和查询工作量来识别可能出现在文档中的结构化属性的算法。我们的实验评估表明,与仅依赖于文本内容或仅依赖于查询工作负载来识别感兴趣的属性的方法相比,我们的方法产生了更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号