首页> 外文会议>2012 IIAI International Conference on Advanced Applied Informatics. >Gathering Public Concerns from Web Towards Building Corpus of Japanese Regional Concerns
【24h】

Gathering Public Concerns from Web Towards Building Corpus of Japanese Regional Concerns

机译:从网络上收集公众关注,以建立日本地区关注的语料库

获取原文
获取原文并翻译 | 示例

摘要

Importance of concern assessment has been increased in Japanese regional communities. We have developed an e-Participation web platform based on a Linked Open Data set called SOCIA (Social Opinions and Concerns for Ideal Argumentation). To sophisticate text mining technologies for supporting concern assessment, building a corpus of public concerns is an urgent task. There are two issues to utilize the dataset SOCIA as a corpus: (1) it is required to manage reliability of annotation and (2) to filter out noisy text not relevant to public concerns. To address these research issues, (1) we incorporate schema for describing meta-context information of annotation, that is, who is annotator, whether the annotator is a human or a software agent, and how reliable the annotation is. Furthermore, (2) we investigate the difference between features of concerns and that of non-concerns in Japanese microblog posts (i.e., tweets). Through the investigation, we address sample selection bias by formulating a novel metric for ranking features, i.e., bias-penalized information gain (BPIG).
机译:在日本地区社区中,关注评估的重要性已经提高。我们已经开发了一个基于名为SOCIA(理想观点的社会观点和关注)的链接开放数据集的电子参与网络平台。为了完善文本挖掘技术以支持关注评估,建立公众关注的语料库是当务之急。利用数据集SOCIA作为语料库有两个问题:(1)需要管理注释的可靠性,(2)过滤掉与公众关注无关的嘈杂文本。为了解决这些研究问题,(1)我们结合了用于描述注释的元上下文信息的模式,即注释的是谁,注释者是人还是软件代理,以及注释的可靠性。此外,(2)我们研究了日本微博帖子(即tweet)中关注和非关注特征之间的差异。通过调查,我们通过为排名特征制定新的指标(即偏见惩罚信息增益(BPIG))来解决样本选择偏见。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号