首页> 外文OA文献 >A Hybrid Approach to Finding Relevant Social Media Content for Complex Domain Specific Information Needs
【2h】

A Hybrid Approach to Finding Relevant Social Media Content for Complex Domain Specific Information Needs

机译:为复杂领域特定信息需求找到相关社交媒体内容的混合方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex, domain specific information needs. Some complex search situations require knowledge of both ontological concepts as well as u27intelligible constructsu27 not typically modeled in ontologies. Intelligible constructs convey essential information, which may be important to the holistic information needs of information seekers. Such constructs may include notions of intensity, frequency, interval, dosage, emotion, sentiment, equivalence, synonymy, negation, parts-of-speech, etc. However, few search systems utilize both structured background knowledge (ontologies) and the aforementioned knowledge for query interpretation in domain specific searches. Instead, there is considerable reliance on ontological knowledge to address search tasks. Given that a vast amount of information is expressed in the unstructured form and therefore not suitable for formal representation in ontologies, there is a clear misalignment between the information needs of users and the knowledge model developed to meet such needs. To address this problem, we present a hybrid approach to domain specific information retrieval that goes beyond ontology-driven query interpretation as well as beyond synonym-based query expansion used in Information Retrieval (IR), to address complex searches. This hybrid approach is particularly effective in searches that involve social media (i.e., web forum posts), in which ontology incompleteness may significantly limit effective query interpretation and information retrieval. Unlike state-of-the-art semantic search and hybrid search applications, we are able to interpret four distinct types of data elements in search of domain specific information using social media. This data includes: 1) ontological concepts; 2) concepts in lexicons (such as emotions, sentiments, etc); 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects, routes and methods of administration, etc), and 4) expressions derived solely through rules (such as date, time, interval, frequency, dosage, etc). Specifically, our hybrid framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of textual patterns that share membership across broad templates (i.e., isomorphic), and 2) a low-level CFG that enables interpretation of specific expressions that may constitute such textual patterns. Our approach is embodied in a novel Semantic Web platform for prescription drug abuse epidemiology called PREDOSE, which when applied to a corpus of over 1 million web forum posts on prescription drug abuse discussions, proved effective in retrieving relevant documents for complex information needs.
机译:尽管现代语义搜索系统提供了改进经典的基于关键字的搜索的功能,但它们并不总是足以满足复杂的,特定领域的信息需求。一些复杂的搜索情况需要知识本体概念以及本体中通常不建模的可理解结构。可理解的结构传达必要的信息,这可能对信息搜索者的整体信息需求很重要。这样的结构可能包括强度,频率,间隔,剂量,情绪,情感,对等,同义词,否定,词性等概念。但是,很少有搜索系统将结构化背景知识(本体论)和上述知识用于特定领域搜索中的查询解释。相反,在很大程度上依赖于本体知识来解决搜索任务。鉴于大量信息以非结构化形式表示,因此不适合本体中的正式表示形式,因此用户的信息需求与为满足此类需求而开发的知识模型之间显然存在不一致。为了解决此问题,我们提出了一种针对领域特定信息检索的混合方法,该方法超越了本体驱动的查询解释以及信息检索(IR)中用于解决复杂搜索的基于同义词的查询扩展。这种混合方法在涉及社交媒体(即网络论坛帖子)的搜索中特别有效,在这种搜索中,本体论的不完整性可能会严重限制有效的查询解释和信息检索。与最新的语义搜索和混合搜索应用程序不同,我们能够使用社交媒体解释特定领域信息的搜索中的四种不同类型的数据元素。这些数据包括:1)本体概念; 2)词典中的概念(例如情绪,情感等); 3)仅具有部分本体表示形式的词汇中的概念,称为词汇本体概念(例如副作用,给药途径和给药方法等),以及4)仅通过规则(例如日期,时间,间隔,频率,剂量等)。具体来说,我们的混合框架基于无上下文语法(CFG),它定义了搜索系统可解释的结构的查询语言。语法提供了两个层次的语义解释:1)顶层CFG,它有助于检索在广泛的模板之间共享成员资格的文本模式(即,同构),以及2)底层CFG,它使得可以解释可能构成这样的文本模式。我们的方法体现在称为PREDOSE的新颖的处方药流行病学语义Web平台中,该平台应用于超过100万个关于处方药滥用讨论的Web论坛帖子的语料库,被证明可有效地检索有关复杂信息需求的相关文档。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号