Using taxonomy, discriminants, and signatures for navigating in text databases

机译：使用分类法，判别式和签名在文本数据库中导航

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora, such as internet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through the query response not as a flat unstructured list, but embedded in the familiar taxonomy, and annotated with document signatures computed dynamically with respect to where the user is located at any time. We show how to update such databases with new documents with high speed and accuracy. We use techniques from statistical pattern recognition to efficiently separate the feature words or discriminants from the noise words at each node of the taxonomy. Using these, we build a multi-level classifier. At each node, this classifier can ignore the large number of noise words in a document. Thus the classifier has a small model size and is very fast. However, owing to the use of context-sensitive features, it classifier is very accurate. We report on experiences with the Reuters newswire benchmark, the US Patent database, and web document samples from Yahoo!.

机译：我们探索如何分层组织文本数据库，以帮助更好地进行搜索和浏览。我们建议利用许多语料库（例如Internet目录，数字图书馆和专利数据库）所享有的主题或分类法的自然层次。在我们的系统中，用户在查询响应中导航的方式不是平整的非结构化列表，而是嵌入到熟悉的分类法中，并使用相对于用户随时随地动态计算的文档签名进行注释。我们展示了如何使用新文档以高速，准确的方式更新此类数据库。我们使用统计模式识别中的技术来有效地将分类词中每个节点的特征词或判别词与噪声词分开。使用这些，我们构建了一个多级分类器。在每个节点上，此分类器可以忽略文档中大量的干扰词。因此，分类器具有小的模型尺寸并且非常快。但是，由于使用了上下文相关功能，因此它的分类器非常准确。我们报告了有关路透社新闻基准，美国专利数据库以及Yahoo!的网络文档样本的经验。

著录项

来源
《Proceedings of the Twenty-third international conference on very large data bases》|1997年|446-455|共10页
会议地点 Athens(GR);Athens(GR)
作者
Soumen Chakrabarti; Byron Dom; Rakesh Agrawal; Prabhakar Raghavan;
展开▼
作者单位

IBM Almaden Research Center;

IBM Almaden Research Center;

IBM Almaden Research Center;

IBM Almaden Research Center;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类各种专用数据库;
关键词

相似文献

外文文献
中文文献
专利

1. An Improved Fast Bangla Character Recognition Method of Printed Text by Constructing Character Signature Database [J] . M. S. Munir, K. Ahmed, A. S. M. Shihavuddin, International Journal of Computational I . 2016,第1期

机译：构造字符签名数据库的改进的印刷文本快速孟加拉字符识别方法
2. An Application of an EM Algorithm for Skew Detection of Signatures in Text Images: Signature Extraction From Images [J] . International journal of computer vision and iImage processing . 2019,第4期

机译：EM算法在文本图像签名偏斜检测中的应用：从图像中提取签名
3. Use of canonical discriminant analysis to study signatures of selection in cattle [J] . Silvia Sorbolini, Giustino Gaspa, Roberto Steri, Genetics, selection, evolution . 2016,第1期

机译：使用规范判别分析研究牛的选择特征
4. Using taxonomy, discriminants, and signatures for navigating in text databases [C] . International conference on very large data bases . 1997

机译：使用分类，判别和签名在文本数据库中导航
5. Human Activity Recognition by L1-Norm Linear Discriminant Analysis of Radar Micro-Doppler Signatures [D] . ?Zlotnikov, Sivan 2020

机译：雷达微多普勒签名的L1-NAR NACL线性判别分析的人类活动识别
6. Combinatorial Discriminant Analysis Applied to RNAseq Data Reveals a Set of 10 Transcripts as Signatures of Exposure of Cattle to Mycobacterium avium subsp. paratuberculosis [O] . Michela Malvisi, Nico Curti, Daniel Remondini, 2020

机译：应用于RNAseq数据的组合判别分析揭示了一组10个转录本作为牛暴露于鸟分枝杆菌亚种的特征。副结核病
7. Methodology For Creating a Sample Subset of Dynamic Taxonomy to Use in Navigating Medical Text Databases [O] . Dennis Wollersheim, Wenny Rahayu 2002

机译：用于创建动态分类的样本子集以在导航医学文本数据库中使用的方法
8. Pigment Fluorescence Signatures as an Index to the Taxonomic Structure of Phytoplankton Communities. [R] . Hitchcock, G., Voss, K. 2001

机译：色素荧光特征作为浮游植物群落分类结构的指标。

Using taxonomy, discriminants, and signatures for navigating in text databases

摘要

著录项

相似文献

相关主题

期刊订阅