Automatic document metadata extraction using support vector machines

机译：使用支持向量机器自动文档元数据提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classification-based method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance. Our metadata extraction method was originally designed to improve the metadata extraction quality of the digital libraries Citeseer [17] and EbizSearch[24]. We believe it can be generalized to other digital libraries.

机译：自动元数据生成为数字库及其集合提供可扩展性和可用性。机器学习方法提供鲁棒和适应的自动元数据提取。我们介绍了一种基于支持向量机分类的方法，用于研究论文的标题部分的元数据提取，并表明它在同一任务上表现出其他机器学习方法。该方法首先将标题的每行分类为15个类中的一个或多个。然后使用迭代收敛过程来通过在前一轮中使用其邻线的预测类标签来改善线条分类。通过寻求每行的最佳块边界来完成进一步的元数据提取。我们发现发现和使用数据和基于域的Word群集的结构模式可以提高元数据提取性能。适当的特征规范化也大大提高了分类性能。我们的元数据提取方法最初旨在提高数字图书馆CITESEER [17]和EBIZSearch [24]的元数据提取质量。我们认为它可以推广到其他数字图书馆。

著录项

来源
《ACM/IEEE-CS joint conference on Digital libraries》|2003年||共12页
会议地点
作者
Hui Han; C. Lee Giles; Eren Manavoglu; Hongyuan Zha; Zhenyue Zhang; Edward A. Fox;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类电子图书馆、数字图书馆;
关键词

相似文献

外文文献
中文文献
专利

1. Automatic Identification and Extraction of Clouds from Astronomical Images Based on Support Vector Machine [J] . WANG Li-wen, JIA Peng, CAI Dong-mei, Chinese Astronomy and Astrophysics . 2019,第1期

机译：基于支持向量机的天文图像自动识别和提取云
2. Automatic Sleep Staging using Multi-dimensional Feature Extraction and Multi-kernel Fuzzy Support Vector Machine [J] . Zhang Yanjun, Zhang Xiangmin, Liu Wenhui, Journal of healthcare engineering. . 2014,第4期

机译：多维特征提取和多核模糊支持向量机的自动睡眠分期
3. Automatic Sleep Staging using Multi-dimensional Feature Extraction and Multi-kernel Fuzzy Support Vector Machine [J] . Zhang Yanjun, Zhang Xiangmin, Liu Wenhui, Journal of healthcare engineering. . 2014,第4期

机译：自动睡眠分期使用多维特征提取和多核模糊支持向量机
4. Automatic document metadata extraction using support vector machines [C] . Hui Han, C. Lee Giles, Eren Manavoglu, ACM/IEEE-CS joint conference on Digital libraries . 2003

机译：使用支持向量机自动提取文档元数据
5. Learning to rank documents with support vector machines via active learning. [D] . Arens, Robert James. 2009

机译：通过主动学习，使用支持向量机学习对文档进行排名。
6. Detecting hospital-acquired infections: A document classification approach using support vector machines and gradient tree boosting [O] . Claudia Ehrentraut, Markus Ekholm, Hideyuki Tanushi, -1

机译：检测医院获得性感染：使用支持向量机和梯度树增强的文件分类方法
7. Automatic document metadata extraction using support vector machines [O] . Hui Han C. Lee Giles, Eren Manavoglu, Hongyuan Zha, 2003

机译：使用支持向量机自动提取文档元数据

Automatic document metadata extraction using support vector machines

摘要

著录项

相似文献

相关主题

期刊订阅