首页> 美国卫生研究院文献>PLoS Computational Biology >LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions
【2h】

LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions

机译:LAILAPS-QSM:RESTful API和JAVA库用于语义查询建议

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user’s registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository:
机译:为了访问和过滤生命科学数据库的内容,全文搜索是一种广泛应用的查询界面。但是它的高度灵活性和直观性因潜在的不精确和不完整的查询结果而付出了代价。为了减少这种缺陷,查询辅助系统建议那些具有最高潜力的关键字组合来匹配大多数相关数据记录。广泛使用的方法是语法查询更正,可避免拼写错误,并通过后缀和前缀支持单词扩展。同义词扩展方法应用叙词表,本体和查询日志。所有这些都需要费力的策展和维护。此外,访问查询日志通常受到限制。根据其查询资料(例如研究领域,地理位置,共同作者,隶属关系等)推断相关查询的方法,要求用户注册以及其公共访问权限与隐私权问题相矛盾。为克服这些缺点,我们实现了LAILAPS-QSM,这是一种机器学习方法,可重构给定关键字查询的可能语言环境。从存储在数据库中的文本记录中引用上下文,这些文本记录将被查询或提取,以用于来自PubMed摘要和UniProt数据的通用查询建议。提供的工具套件可对这些文本记录进行预处理,并进一步计算定制的分布式单词向量。后者用于建议替代关键字查询。针对植物科学用例对查询建议质量进行了评估。当地专家可以利用本体术语相似性对性状,生物实体,分类学,隶属关系和代谢功能等类别进行具有成本效益的质量评估。 LAILAPS-QSM的15个代表性查询的平均信息内容相似度为0.70,而34%的分数高于0.80。相比之下,人类专家提出的查询建议的信息内容相似度为0.90。该软件既可以作为构建和培训专用查询建议服务的工具集使用,也可以作为经过培训的通用RESTful Web服务使用。该服务使用开放接口将其无缝嵌入数据库前端。 JAVA实现使用高度优化的数据结构和简化的代码来为Web服务调用提供快速且可扩展的响应。 LAILAPS-QSM的源代码在Bitbucket GIT存储库中的GNU通用公共许可证版本2下可用:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号