Revisiting Document Length Hypotheses: A Comparative Study of Japanese Newspaper and Patent Retrieval

SUMIO FUJITA

首页> 外文期刊>ACM transactions on Asian language information processing >Revisiting Document Length Hypotheses: A Comparative Study of Japanese Newspaper and Patent Retrieval

【24h】

Revisiting Document Length Hypotheses: A Comparative Study of Japanese Newspaper and Patent Retrieval

机译：再谈文献长度假说：日本报纸和专利检索的比较研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

NTCIR-4 experiments of the CLIR J-J (Japanese monolingual newspaper retrieval) and patent tasks are described, focusing on comparative studies of two test collections and two retrieval approaches in view of document length hypotheses. TF~*IDF outperformed the language modeling approach in the CLIR J-J task whereas the language modeling approach performed better in the patent task. Two different document length hypotheses behind two tasks/collections are assumed by analyzing document length distributions of relevant/retrieved documents in the NTCIR-3 and -4 collections. Given these hypotheses, TF~*IDF is easily adapted to patent retrieval tasks. Document length prior probabilities are applied to the language modeling approach. For the patent task, task-specific techniques, such as IPC priors and different indexing strategies, are evaluated and reported. To facilitate retrieval from large patent collections, a simple distributed search strategy is applied and found to be efficient, despite a slight deterioration of effectiveness. We found that TF~*IDF performed similarly to the language modeling runs against the patent collection by controlling the document length normalization, whereas the language modeling approach does not perform as well as TF~*IDF, despite calibration against the CLIR J-J collection. The different characteristics of the document lengths of the two test collections are illustrated through comparative studies.

机译：描述了CLIR J-J（日本单语报纸检索）和专利任务的NTCIR-4实验，着眼于基于文档长度假设的两个样本集和两种检索方法的比较研究。 TF〜* IDF在CLIR J-J任务中胜过语言建模方法，而在专利任务中语言建模方法表现更好。通过分析NTCIR-3和-4集合中相关/已检索文件的文件长度分布，可以假定两个任务/集合后面有两个不同的文件长度假设。鉴于这些假设，TF〜* IDF很容易适应专利检索任务。文档长度先验概率被应用于语言建模方法。对于专利任务，评估和报告特定于任务的技术，例如IPC先验技术和不同的索引策略。为了促进从大型专利馆藏中检索，尽管效率略有下降，但采用了一种简单的分布式搜索策略，该策略很有效。我们发现，通过控制文档长度规范化，TF〜* IDF的执行与语言建模类似，对专利收集产生了不利影响，而尽管对CLIR J-J收集进行了校准，但语言建模方法的表现却不及TF〜* IDF。通过比较研究说明了两个测试集的文档长度的不同特征。

著录项

来源
《ACM transactions on Asian language information processing》 |2005年第2期|p.207-235|共29页
作者
SUMIO FUJITA;
展开▼
作者单位

Yahoo Japan Corporation, Mori-tower, Roppongi 6-10-1, Minato-ku, Tokyo 106-6182, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
test collections; document length hypotheses; language modeling approach to IR;

机译：测试集;文档长度假设;IR的语言建模方法;
入库时间 2022-08-17 13:43:28

相似文献

外文文献
中文文献
专利

1. Technology survey and invalidity search: A comparative study of different tasks for Japanese patent document retrieval [J] . Sumio Fujita Information Processing & Management . 2007,第5期

机译：技术调查和无效性检索：日本专利文献检索不同任务的比较研究
2. An Empirical Study on Retrieval Models for Different Document Genres: Patents and Newspaper Articles [J] . Makoto Iwayama, Atsushi Fujii, Noriko Kando, ACM SIGIR FORUM . 2003,第Special期

机译：不同文献类型的检索模型的实证研究：专利和报纸文章
3. Satisfying the needs of Japanese cancer patients: A comparative study of detailed and standard informed consent documents [J] . SatoK., WatanabeT., KatsumataN., Clinical trials: journal of the Society for Clinical Trials . 2014,第1期

机译：满足日本癌症患者的需求：详细和标准知情同意书的比较研究
4. An Empirical Study on Retrieval Models for Different Document Genres: Patents and Newspaper Articles [C] . Makoto Iwayama, Atsushi Fujii, Noriko Kando, The Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Jul 28-Aug 1, 2003 Toronto, Canada . 2003

机译：不同文献类型的检索模型的实证研究：专利和报纸文章
5. Newspaper work in a time of digital change: A comparative study of U.S. and Japanese journalists. [D] . Minami, Hiroko. 2011

机译：数字化时代的报纸工作：对美国和日本记者的比较研究。
6. A study of the changes in how medically related events are reported in Japanese newspapers [O] . Yukiko Kishi, Naoko Murashige, Yuko Kodama, 2010

机译：关于日本报纸报道医疗事件的方式变化的研究
7. Evaluation of the Derwent World Patents Index (DWPI) abstracts quality using Japanese patent documents [O] . Takami MATSUTANI, Noriko OKA, Nobuyuki KOBAYASHI, 2013

机译：使用日本专利文献评估Derwent World Patents指数（DWPI）摘要质量

Revisiting Document Length Hypotheses: A Comparative Study of Japanese Newspaper and Patent Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅