首页> 外文会议>Sixth International Conference on Semantics Knowledge and Grid >Characteristics and Uses of Labeled Datasets - ODP Case Study

【24h】

Characteristics and Uses of Labeled Datasets - ODP Case Study

机译：标记数据集的特征和用途-ODP案例研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Labeled datasets are essential for text categorization. They are used to train a classifier, or as a benchmark collection to evaluate categorization algorithms. However, labeling a large-scale document set is extremely expensive because it involves much human labour, and the labeling process itself is subjective rather than objective. Therefore, labels assigned to documents by only one human editor in some existing labeled document sets may be of limited use and may prove problematic for training a classifier or evaluating categorization algorithms. This research explores socially constructed Web directory, the Open Directory Project (ODP), to generate a series of labeled document sets by extracting semantic characteristics from the ODP categories which are annotated by a list of indexed Websites. The generated document sets are used to classify Web search results and the results are encouraging.

机译：标记的数据集对于文本分类至关重要。它们用于训练分类器，或用作评估分类算法的基准集合。但是，标记大型文档集非常昂贵，因为它涉及大量的人工，并且标记过程本身是主观的而不是客观的。因此，在某些现有的带标签的文档集中仅由一个人工编辑者分配给文档的标签可能用途有限，并且可能在训练分类器或评估分类算法方面存在问题。这项研究探索了社会构建的Web目录，即Open Directory Project（ODP），通过从ODP类别中提取语义特征来生成一系列带标签的文档集，这些语义特征由索引的网站列表进行了注释。生成的文档集用于对Web搜索结果进行分类，结果令人鼓舞。

著录项

来源
《Sixth International Conference on Semantics Knowledge and Grid 》|2010年|p.227-234|共8页
会议地点
作者
Zhu Dengya; Dreher Heinz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计 ;
关键词

相似文献

外文文献
中文文献
专利

1. Tips, guidelines and tools for managing multi-label datasets: The mldr.datasets R package and the Cometa data repository [J] . Charte Francisco, Rivera Antonio J., Charte David, Neurocomputing . 2018 ,第MAY10期

机译：管理多标签数据集的提示，准则和工具：mldr.datasets R软件包和Cometa数据存储库
2. A OT-k LABEL LEARNING CLASSIFICATION BASED ON ASSOCIATION RULES FOR MULTI-LABEL DATASETS [J] . L. KIRAN KUMAR REDDY, Dr. S. PHANI KUMAR Journal of Theoretical and Applied Information Technology . 2017 ,第19期

机译：基于关联规则的多标签数据集OT-k标签学习分类
3. Use of multisensor and multitemporal geospatial datasets to extract the foundation characteristics of a large building: a case study [J] . Gokceoglu Candan, Kocaman Sultan, Nefeslioglu Hakan A., Bulletin of engineering geology and the environment . 2021 ,第4期

机译：使用多传感器和多立体地理空间数据集以提取大型建筑的基础特性：案例研究
4. Characteristics and Uses of Labeled Datasets - ODP Case Study [C] . Zhu Dengya, Dreher Heinz International Conference on Semantics, Knowledge and Grid . 2010

机译：标记数据集的特点和用途 - ODP案例研究
5. Computationally efficient hierarchical spatial models for large datasets: A case study for the assessment of forest characteristics across the Lake States. [D] . Zhu, Huirong. 2011

机译：大型数据集的计算有效分层空间模型：一个评估整个湖州森林特征的案例研究。
6. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications [O] . Yiyan Zhang, Yi Xin, Qin Li, 2017

机译：七种数据挖掘算法在生物医学分类应用中不同数据集特征的实证研究
7. Labelling Imaging Datasets on the Basis of Neuroradiology Reports: A Validation Study [O] . David A. Wood, Sina Kafiabadi, Aisha Al Busaidi, 2020

机译：基于神经皮层的报告标记成像数据集：验证研究

Characteristics and Uses of Labeled Datasets - ODP Case Study

摘要

著录项

相似文献

相关主题

期刊订阅