首页> 外文会议>International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management >Tag Recommendation for Open Government Data by Multi-label Classification and Particular Noun Phrase Extraction
【24h】

Tag Recommendation for Open Government Data by Multi-label Classification and Particular Noun Phrase Extraction

机译:通过多标签分类和特定名词短语提取的开放政府数据标记建议

获取原文

摘要

Open government data (OGD) is statistical data made and published by governments. Administrators often give tags to the metadata of OGD. Tags, which are a collection of a single word or multiple words, express the data. Tags are useful to understand the data without actually reading the data and also to search for OGD. However, administrators have to understand the data in detail in order to assign tags. We take two different approaches for giving appropriate tags to OGD. First, we use a multi-label classification technique to give tags to OGD from tags in the training data. Second, we extract particular noun phrases from the metadata of OGD by calculating the difference between the frequency of a noun phrase and the frequencies of single words within the noun phrase. Experiments using 196,587 datasets on Data.gov show that the accuracy of prediction by the multi-label classification method is enough to develop a tag recommendation system. Also, the experiments show that our extraction method of particular noun phrases extracts some infrequent tags of the datasets.
机译:开放式政府数据(OGD)是各国政府发布的统计数据。管理员经常向OGD的元数据提供标记。标签,它们是单个单词或多个单词的集合,表达数据。在没有实际读取数据的情况下,标签是有用的,无法读取数据,也很有用,并且还可以搜索OGD。但是,管理员必须详细了解数据以分配标记。我们采取两种不同的方法来为OGD提供适当的标签。首先,我们使用多标签分类技术从训练数据中的标签向OGD标记。其次,通过计算名词短语频率与名词短语内单词的频率之间的差异来提取来自OGD的元数据的特定名词短语。使用196,587个数据集在Data.gov上的实验表明,多标签分类方法预测的准确性足以开发标签推荐系统。此外,实验表明,我们的特定名词短语的提取方法提取了数据集的一些不常见的标记。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号