Corpus Based Unsupervised Labeling of Documents

机译：基于语料库的文档无监督标签

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of documents from sources like the World Wide Web, manual labeling by domain experts is prohibitively expensive. The problem of reducing effort in labeling of documents has warranted a lot of investigation in the past. Most of this work involved some kind of supervised or semi-supervised learning. This motivates the need to find automatic methods for annotating documents with labels. In this work we explore a novel method of assigning labels to documents without using any training data. The proposed method uses clustering to build semantically related sets that are used as candidate labels to documents. This technique could be used for labeling large corpora in an unattended fashion.

机译：文本分类涉及将文档映射到一组固定的标签。一个相似但同样重要的问题是为大型语料库分配标签的问题。大量的文件来自诸如Internet之类的资源，因此领域专家手动标记的费用过高。过去减少文件标签工作量的问题值得进行大量研究。大部分工作涉及某种监督或半监督学习。这激发了寻找用于使用标签注释文档的自动方法的需求。在这项工作中，我们探索了一种无需使用任何培训数据即可为文档分配标签的新颖方法。所提出的方法使用聚类来建立语义相关的集合，该集合用作文档的候选标签。此技术可以无人值守的方式用于标记大型语料库。

著录项

来源
《International Florida Artificial Intelligence Research Society Conference(FLAIRS 2006); 20060511-13; Melbourne Beach,FL(US)》|2006年|P.321-326|共6页
会议地点 Melbourne BeachFL(US)
作者
Delip Rao; Deepak P; Deepak Khemani;
展开▼
作者单位

Department of Computer Science and Engineering Indian Institute of Technology Madras;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics [J] . Singh Jasmeet, Gupta Vishal Knowledge-Based Systems . 2019,第SEPa15期

机译：一种新的基于词典和语料统计的无监督语料库词干提取技术
2. A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics [J] . Singh Jasmeet, Gupta Vishal Knowledge-Based Systems . 2019,第Sepa15期

机译：一种使用Lexicon和Corpus统计的基于无监督的语料库的茎秆技术
3. Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method [J] . Yeom Hongseon, Ko Youngjoong, Seo Jungyun Computer speech and language . 2019,第NOVa期

机译：通过有效结合基于图的模型和改进的C值方法从单个文档中提取基于无监督学习的关键字
4. Corpus Based Unsupervised Labeling of Documents [C] . Delip Rao, Deepak P., Deepak Khemani International Florida Artificial Intelligence Research Society Conference . 2006

机译：基于语料库的文件无监督标签
5. Corpus-based methods for the unsupervised grading of short answer questions. [D] . Proffitt, Eric Shaun. 2015

机译：基于语料库的简短答案问题的无监督评分方法。
6. Unsupervised Chunking Based on Graph Propagation from Bilingual Corpus [O] . Ling Zhu, Derek F. Wong, Lidia S. Chao -1

机译：基于双语语料库图传播的无监督分块
7. Sentence Level N-Gram Context Feature in Real-Word Spelling Error Detection and Correction: Unsupervised Corpus Based Approach [O] . 2020

机译：句子级别n-gram上下文特征在实际单词拼写错误检测和校正中：基于无监督的语料库方法
8. Unsupervised Semantic Labeling Framework for Identification of Complex Facilities in High-Resolution Remote Sensing Images. [R] . Vatsavai, R. R., Cheriyadat, A., Gleason, S. 2013

机译：高分辨率遥感影像中复杂设施识别的无监督语义标注框架。

Corpus Based Unsupervised Labeling of Documents

摘要

著录项

相似文献

相关主题

期刊订阅