Mining for evidence in enterprise corpora.

机译：在企业语料库中挖掘证据。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The primary research aim of this dissertation is to identify the strategies that best meet the information retrieval needs as expressed in the "e-discovery" scenario. This task calls for a high-recall system that, in response to a request for all available relevant documents to a legal complaint, effectively prioritizes documents from an enterprise document collection in order of likelihood of relevance. High recall information retrieval strategies, such as those employed for e-discovery and patent or medical literature searches, reflect high costs when relevant documents are missed, but they also carry high document review costs.;Our approaches parallel the evaluation opportunities afforded by the TREC Legal Track. Within the ad hoc framework, we propose an approach that includes query field selection, techniques for mitigating OCR error, term weighting strategies, query language reduction, pseudo-relevance feedback using document metadata and terms extracted from documents, merging result sets, and biasing results to favor documents responsive to lawyer-negotiated queries. We conduct several experiments to identify effective parameters for each of these strategies.;Within the relevance feedback framework, we use an active learning approach informed by signals from collected prior relevance judgments and ranking data. We train a classifier to prioritize the unjudged documents retrieved using different ad hoc information retrieval techniques applied to the same topic. We demonstrate significant improvements over heuristic rank aggregation strategies when choosing from a relatively small pool of documents. With a larger pool of documents, we validate the effectiveness of the merging strategy as a means to increase recall, but that sparseness of judgment data prevents effective ranking by the classifier-based ranker.;We conclude our research by optimizing the classifier-based ranker and applying it to other high recall datasets. Our concluding experiments consider the potential benefits to be derived by modifying the merged runs using methods derived from social choice models. We find that this technique, Local Kemenization, is hampered by the large number of documents and the minimal number of contributing result sets to the ranked list. This two-stage approach to high-recall information retrieval tasks continues to offer a rich set of research questions for future research.

机译：本文的主要研究目的是确定最能满足“电子发现”场景中表达的信息检索需求的策略。此任务需要一个高召回率的系统，该系统可响应对法律投诉的所有可用相关文档的请求，以相关可能性的顺序有效地对企业文档收集中的文档进行优先级排序。高召回率的信息检索策略（例如用于电子发现和专利或医学文献搜索的信息检索策略）在丢失相关文档时反映出高昂的成本，但同时也带来了高昂的文档审阅成本。我们的方法与TREC提供的评估机会平行法律轨道。在临时框架内，我们提出一种方法，其中包括查询字段选择，缓解OCR错误的技术，术语加权策略，查询语言简化，使用文档元数据和从文档中提取的术语的伪相关反馈，合并结果集以及偏差结果支持响应律师协商的查询的文件。我们进行了一些实验来确定每种策略的有效参数。在相关性反馈框架内，我们使用主动学习方法，该方法是从收集的先前相关性判断和排名数据中获取信号的。我们训练一个分类器，以区分使用适用于同一主题的不同即席信息检索技术检索的未判断文档的优先级。从相对较小的文档库中进行选择时，我们证明了对启发式排名聚合策略的重大改进。有了更多的文档库，我们验证了合并策略作为增加召回率的一种方法的有效性，但是判断数据的稀疏阻碍了基于分类器的排名的有效排名。并将其应用于其他高召回率数据集。我们的结论性实验考虑了使用源自社会选择模型的方法修改合并后的运行所获得的潜在利益。我们发现，这种技术（本地Kemenization）受到大量文档和对排序列表的最小贡献结果集的阻碍。这种针对高召回率信息检索任务的两阶段方法继续为将来的研究提供了一系列丰富的研究问题。

著录项

作者
Almquist, Brian Alan.;
展开▼
作者单位

The University of Iowa.;

展开▼
授予单位 The University of Iowa.;
学科 Information technology.
学位 Ph.D.
年度 2011
页码 150 p.
总页数 150
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The impact of geographical location on inclusion of small and medium enterprises in the mining global value chain in Zambia: A case of selected small and medium enterprises (SMES) in the mining area [J] . Peter Kanyinji, Gelson Tembo African Journal of Business Management . 2019,第16期

机译：地理位置对中小型企业纳入赞比亚矿业全球价值链的影响：以矿区中小型企业（SMES）为例
2. Critical factors to green mining construction in China: A two-step fuzzy DEMATEL analysis of state-owned coal mining enterprises [J] . Qi Rui, Li Sha, Qu Lu, Journal of Cleaner Production . 2020,第Nova10期

机译：中国绿色矿业建设的关键因素：国有煤矿企业两步模糊分析分析
3. Effects of the private land acquisition process and costs on mining enterprises before mining operation activities in Turkey [J] . Yildiz Taskin Deniz Land Use Policy . 2020,第1期

机译：私营土地收购流程和成本对土耳其采矿业务活动前采矿企业的影响
4. Developing evidence guidelines for eHealth Small and Medium-sized Enterprises: Towards feasible yet convincing evidence [C] . Ruud Janssen, Marike Hettinga, Hilco Prins, International Conference on eHealth, Telemedicine, and Social Medicine . 2013

机译：制定电子医疗中小企业的证据准则：对可行但令人信服的证据
5. German demonstrative adverbs of spatial deixis: Evidence from native speakers, L2 learners, and corpora. [D] . Gajdos, Johnathan Lee William. 2011

机译：德语的空间指示性副词：母语人士，二语学习者和语料库的证据。
6. On What Could Chinese Mining Enterprises Achieve High-Level Environmental Performance?—Based on the fsQCA Method [O] . Zhengjie Gao, Dayi He, Shuaifang Niu 2021

机译：关于中国矿业企业可以实现高水平的环境绩效？基于FSQCA方法
7. Automated system of monitoring and positioning of functional units of mining technological machines for coal-mining enterprises [O] . Yaroslav Meshcheryakov, Roman Meshcheryakov 2018

机译：煤矿企业采矿技术机器功能单元的自动化系统

Mining for evidence in enterprise corpora.

摘要

著录项

相似文献

相关主题

期刊订阅