Deriving Multiple Topics to Label Small Document Regions

机译：导出多个主题来标记小型文档区域

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Information retrieval can be greatly enhanced if the semantics of document contents are made explicit as labels that can be queried by markup-sensitive languages. We focus on labelling small text fragments, such as parts of sentences or paragraphs, with frequent topics. We propose WORDtrain, a sequence miner that builds topics for small document regions, such as sentences with many subsentences. WORDtrain splits regions in such a way that non-overlapping fragments are built and the topics derived for them are frequent. WORDtrain discovers frequent topics rather than choosing from a predefined reference list. This raises the issue of evaluating the quality of its resuls. To this purpose, we have designed two evaluation schemes, one requiring expert involvement and an automatic one. Our first experiments with these schemes show that WORDtrain yields promising results.

机译：如果文档内容的语义明确为可通过标记敏感语言查询的标签，可以大大提高信息检索。我们专注于标记小文本碎片，例如句子或段落的部分，具有频繁的主题。我们提出Wordtrain，一个序列矿工，为小型文档区域构建主题，例如具有许多子句的句子。 Wordtrain以这样的方式拆分区域，即构建不重叠的片段并且频繁导出的主题。 Wordtrain发现频繁主题而不是从预定义的参考列表中选择。这提出了评估其重大质量的问题。为此目的，我们设计了两个评估计划，一个需要专家参与和自动的计划。我们的第一个实验与这些计划表明，Wordtrain产生了有希望的结果。

著录项

来源
《International Conference on Data Warehousing and Knowledge Discovery》|2004年||共10页
会议地点
作者
Henner Graubitz; Myra Spiliopoulou; Lecture Notes in Computer Science 3181;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-532;
关键词

相似文献

外文文献
中文文献
专利

1. Automatic topic labelling for text document using ontology of graph-based concepts and dependency graph [J] . Phu Pham, Phuc Do, Chien D.C. Ta International journal of business information systems . 2021,第2期

机译：使用基于图形的概念和依赖图的本体，自动主题标记文本文档
2. Controlling contents in data-to-document generation with human-designed topic labels [J] . Kasumi Aoki, Akira Miyazawa, Tatsuya Ishigaki, Computer speech and language . 2021,第Mara期

机译：使用人为设计主题标签控制数据到文档生成中的内容
3. Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification [J] . Soleimani Hossein, Miller David J. Pattern Analysis and Applications . 2019,第2期

机译：在高维特征空间上利用类标签的价值：用于半监督文档分类的主题模型
4. Deriving Multiple Topics to Label Small Document Regions [C] . Henner Graubitz, Myra Spiliopoulou Data Warehousing and Knowledge Discovery . 2004

机译：派生多个主题以标记小文档区域
5. Multiple Document Question Instruction: Promoting Multiple Document Thinking through Inquiry [D] . Cameron, Chelsea E. 2018

机译：多文档问题教学：通过查询促进多文档思考
6. Radiation-induced meningiomas in multiple regions showing rapid recurrence and a high MIB 1 labeling index: a case report and review of the literature [O] . Yoshiaki Goto, So Yamada, Shoko M Yamada, 2014

机译：辐射诱发的多发性脑膜瘤表现出快速复发和高MIB 1标记指数：一例病例并文献复习
7. Neural labeled LDA: a topic model for semi-supervised document classification [O] . Wei Wang, Bing Guo, Yan Shen, 2021

机译：神经标记的LDA：半监督文件分类主题模型

Deriving Multiple Topics to Label Small Document Regions

摘要

著录项

相似文献

相关主题

期刊订阅