Optimizing Scoring Functions and Indexes for Proximity Search in Type-annotated Corpora

机译：在带类型注释的语料库中优化评分函数和索引以进行邻近搜索

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce a new, powerful class of text proximity queries: find an instance of a given "answer type" (person, place, distance) near "selector" tokens matching given literals or satisfying given ground predicates. An example query is type=distance NEAR Hamburg Munich. Nearness is defined as a flexible, trainable parameterized aggregation function of the selectors, their frequency in the corpus, and their distance from the candidate answer. Such queries provide a key data reduction step for information extraction, data integration, question answering, and other text-processing applications. We describe the architecture of a next-generation information retrieval engine for such applications, and investigate two key technical problems faced in building it. First, we propose a new algorithm that estimates a scoring function from past logs of queries and answer spans. Plugging the scoring function into the query processor gives high accuracy: typically, an answer is found at rank 2-4. Second, we exploit the skew in the distribution over types seen in query logs to optimize the space required by the new index structures required by our system. Extensive performance studies with a 10GB, 2-million document TREC corpus and several hundred TREC queries show both the accuracy and the efficiency of our system. From an initial 4.3GB index using 18,000 types from WordNet, we can discard 88% of the space, while inflating query times by a factor of only 1.9. Our final index overhead is only 20% of the total index space needed.

机译：我们引入了一种新的，功能强大的文本接近度查询类：在与给定文字匹配或满足给定地面谓词的“选择器”标记附近找到给定“答案类型”（人，地点，距离）的实例。示例查询是type = distance NEAR汉堡慕尼黑。邻近度定义为选择器的灵活，可训练的参数化聚合函数，选择器在语料库中的频率以及与候选答案的距离。此类查询为信息提取，数据集成，问题解答和其他文本处理应用程序提供了关键的数据精简步骤。我们描述了用于此类应用的下一代信息检索引擎的体系结构，并研究了构建它时面临的两个关键技术问题。首先，我们提出了一种新算法，该算法可根据过去的查询和回答范围日志估算评分函数。将计分功能插入查询处理器可以提高准确性：通常，在2-4级别找到答案。其次，我们利用查询日志中看到的类型分布的偏斜来优化系统所需的新索引结构所需的空间。通过使用10GB，200万个文档的TREC语料库和数百个TREC查询进行的广泛性能研究显示了我们系统的准确性和效率。从最初使用WordNet的18,000种类型的4.3GB索引开始，我们可以丢弃88％的空间，而查询时间却只增加了1.9倍。我们最终的索引开销仅为所需总索引空间的20％。

著录项

来源
《International World Wide Web Conference; Edinburgh(GB)》||P.789-798|共10页
会议地点 Edinburgh(GB)
作者
Soumen Chakrabarti; Kriti Puniyani; Sujatha Das;
展开▼
作者单位

IIT Bombay;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
indexing annotated text;

机译：索引带注释的文本;

相似文献

外文文献
中文文献
专利

1. Determining differential item functioning and its effect on the test scores of selected pib indexes, using item response theory techniques [J] . Pieter Schaap SA Journal of Industrial Psychology . 2001,第2期

机译：使用项目反应理论技术确定差异项目功能及其对所选pib指数的测试分数的影响
2. Weighted differential evolution algorithm for numerical function optimization: a comparative study with cuckoo search, artificial bee colony, adaptive differential evolution, and backtracking search optimization algorithms [J] . Neural computing & applications . 2020,第8期

机译：数值函数优化的加权差分演化算法：杜鹃搜索，人工群落，自适应差分演化和回溯搜索优化算法的比较研究
3. Coronary artery calcium score assessed by a 64 multislice computed tomography and early indexes of functional and structural vascular remodeling in cardiac syndrome X patients. [J] . Mizia-Stec K, Haberka M, Mizia M, Journal of nuclear cardiology: official publication of the American Society of Nuclear Cardiology . 2008,第5期

机译：心脏X综合征患者的64层计算机断层扫描和功能和结构性血管重构的早期指标评估了冠状动脉钙质评分。
4. Optimizing Scoring Functions and Indexes for Proximity Search in Type-annotated Corpora [C] . Soumen Chakrabarti, Kriti Puniyani, Sujatha Das International World Wide Web Conference . 2006

机译：优化类型注释的语料库中的邻近搜索的评分函数和索引
5. Optimizing Search Indexes Using Query Distributions [D] . Wang, Qi. 2019

机译：使用查询分布优化搜索索引
6. The Minimized Dead-End Elimination Criterion and Its Application to Protein Redesign in a Hybrid Scoring and Search Algorithm for Computing Partition Functions over Molecular Ensembles [O] . Ivelin Georgiev, Ryan H. Lilien, Bruce R. Donald -1

机译：最小末端消除标准及其在蛋白质重新设计中的应用-基于分子积分计算分区功能的混合评分和搜索算法
7. MODERATED EPOSTERS1385Longitudinal strain assessment in dilated cardiomyopathy patients using a novel accelerated DENSE sequence1407Simultaneous T1 and T2 cardiac quantification with CABIRIA: initial clinical experience1423Head-to-head comparison of acceleration algorithms in 4-dimensional flow CMR1502Left ventricular function and size evaluated by hybrid cardiac positron emission tomography-magnetic resonance: Intraindividual comparison of left ventricular ejection fraction and ventricular volumes derived by two modalities1510Left Atrium assessed by Cardiovascular Magnetic Resonance at 1.5 and 3 Tesla – age and gender effects1514Comparison of Free Breathing Cardiac MRI Radial technique to the Standard Multi breath-hold cine SSFP CMR technique for the assessment of LV Volumes and Function1536Self-navigated free-breathing isotropic 3D whole heart phase sensitive inversion recovery magnetic resonance without navigator for detection of myocardial infarction1547Assessment of Right Ventricular Strain Using Myocardial Deformation Recovery Semi Automated Technique: Initial Experience and Normal Values1586Tissue tracking myocardial deformation analysis and prediction of left ventricular remodeling in acute myocardial infarction1589Investigating strategies for optimal 31P MRS clinical cardiac at 3T: Initial Results1620Quantitative Criteria for the Diagnosis of the Congenital Absence of Pericardium by Cardiac Magnetic Resonance1632Widespread tissue injury during acute myocardial infarction: evidence from advanced CMR relaxometry1322Computed tomography coronary angiography verSus sTRess cArdiac magneTic rEsonance for the manaGement of sYmptomatic revascularized patients: a cost effectiveness study (STRATEGY study)1339Comparison of low- versus high-dose of gadobutrol for late gadolinium enhancement imaging at 1.5 Tesla: a clinical feasibility study1347Multi-parametric Cardiac Magnetic Resonance for Prediction of Cardiac Complications in Thalassemia Intermedia: a Prospective Multicenter Study1461Prognostic value of Cardiovascular Magnetic Resonance derived indexes of myocardial fibrosis in heart transplant recipients1523The role of CMR in the acute phase of hospitalization: changing paradigms1542Preoperative CMR-based score predict ventricular response after surgical left ventricular reconstruction in ischemic heart failure patients1555Excellent response rate to cardiac resynchronization therapy guided with magnetic resonance imaging1626The ECG as a predictor of arrhythmogenic substrate on Cardiac Magnetic Resonance Imaging in patients undergoing ablation for premature ventricular contractions1649Comparison of T1-mapping at 3.0T CMR and angiographic APPROACH score for area at risk assessment in ST-segment elevation myocardial infarction1340Pathological correlates of left bundle branch disease in patients with non-ischemic cardiomyopathy: a cardiovascular magnetic resonance study1342Myocardial remodelling and fibrosis in nonischaemic dilated cardiomyopathy: insights from cardiovascular magnetic resonance1411The association between fibrosis and contractile dysfunction in hypertrophic cardiomyopathy assessed by cardiovascular magnetic resonance1622Persistent myocardial inflammation due to intramyocardial haemorrhage in reperfused STEMI as a precursor to adverse LV remodelling - insights from multi-parametric mapping1566Semiquantitative analysis of low and high b value DWI for detecting myocardial edema in acute myocarditis1567Value of Cardiac MRI In Detecting Coronary Artery Disease In Newly Diagnosed Systolic Dysfunction1570Usefulness of cardiac magnetic resonance in tuberous sclerosis complex1578Papillary muscles offer further insight into hypertrophied hearts: a cardiovascular magnetic resonance study1627Diagnostic and clinical implications of CMR timing (early versus late) in patients with troponin positive acute coronary syndromes and unobstructed coronary arteries: Table 1. [O] . Upasana Tayal, Alexandros Kallifatidis, P. Garg, 2016

机译：在使用新的扩张型心肌病的患者缓和EPOSTERS1385Longitudinal应变评估加速DENSE sequence1407Simultaneous T1和T2与CABIRIA心脏定量：在4维流动的加速算法初始临床experience1423Head对头比较CMR1502Left心室功能和尺寸由混合心脏正电子发射断层摄影术评价 - 磁性共振：由两个modalities1510Left庭派生左室射血分数和心室体积的个体间的比较，在1.5和3特斯拉评估心血管磁共振 - 免费的年龄和性别effects1514Comparison呼吸心脏MRI径向技术标准的多屏气电影SSFP CMR技术的LV卷和Function1536Self-导航自由呼吸各向同性3D整个心脏相位敏感反转恢复磁共振导航仪没有检测右Ventricu心肌infarction1547Assessment的评估拉尔菌株使用心肌变形恢复半自动技术：初步经验和正常Values1586Tissue跟踪心肌变形分析和预测左室重构急性心肌infarction1589Investigating策略优化31P MRS临床心脏在3T：初始Results1620Quantitative标准的先天缺失的诊断心包心脏磁Resonance1632Widespread组织损伤急性心肌梗死时：证据先进CMR relaxometry1322Computed CT冠状动脉成像与压力心脏磁共振对症的管理吻合血管患者：成本效益研究（战略研究）1339Comparison低与高剂量钆布醇在1.5特斯拉晚钆增强成像的临床可行性，在中间型地贫心脏并发症的预测study1347Multi参数心脏核磁共振的前瞻性德穆尔心血管磁共振的ticenter Study1461Prognostic价值衍生心肌纤维化指标在住院的急性期CMR的心脏移植recipients1523The作用：改变基于CMR-paradigms1542Preoperative比分预测缺血性心脏衰竭patients1555Excellent响应速度外科左心室重建心脏再同步化后心室反应治疗与磁共振imaging1626The ECG导引作为心脏磁共振成像致心律失常性基板的在ST段抬高心肌infarction1340Pathological在风险评估经历在3.0T CMR T1映射的室性早搏contractions1649Comparison和血管造影APPROACH分数区域消融患者的预测左束支传导疾病的患者与非缺血性心肌病相关因素：心血管磁共振study1342Myocardial重塑和纤维化nonischaemic扩张型心肌病：在从在由心血管磁性resonance1622Persistent心肌炎症评估肥厚型心肌病的纤维化和收缩功能障碍之间心血管磁性resonance1411The协会由于心肌内出血景点在再灌注STEMI为先导，以不利的LV重塑 - 从低的多参数mapping1566Semiquantitative分析和高的b值DWI的见解为在心脏MRI检测冠状动脉疾病急性myocarditis1567Value在结节性硬化症complex1578Papillary肌肉初诊收缩期Dysfunction1570Usefulness心脏磁共振检测心肌水肿提供进一步的深入了解肥大心脏：心血管磁共振study1627Diagnostic和CMR定时的临床意义（早期与晚）患者肌钙蛋白阳性的急性冠脉综合征和通畅的冠状动脉：表1。
8. Optimizing an emperical scoring function for transmembrane protein structure determination. [R] . 2003

机译：优化跨膜蛋白结构测定的经验评分函数。

Optimizing Scoring Functions and Indexes for Proximity Search in Type-annotated Corpora

摘要

著录项

相似文献

相关主题

期刊订阅