【24h】

Mining User-Generated Comments

机译:挖掘用户生成的注释

获取原文

摘要

Social-media websites, such as newspapers, blogs, and forums, are the main places of generation and exchange of user-generated comments. These comments are viable sources for opinion mining, descriptive annotations and information extraction. User-generated comments are formatted using a HTML template, they are therefore entwined with the other information in the HTML document. Their unsupervised extraction is thus a taxing issue - even greater when considering the extraction of nested answers by different users. This paper presents a novel technique (CommentsMiner) for unsupervised users comments extraction. Our approach uses both the theoretical framework of frequent subtree mining and data extraction techniques. We demonstrate that the comment mining task can be modelled as a constrained closed induced subtree mining problem followed by a learning-to-rank problem. Our experimental evaluations show that CommentsMiner solves the plain comments and nested comments extraction problems for 84% of a representative and accessible dataset, while outperforming existing baselines techniques.
机译:社交媒体网站(例如报纸,博客和论坛)是生成和交换用户生成的评论的主要场所。这些评论是用于观点挖掘,描述性注释和信息提取的可行资源。用户生成的注释使用HTML模板格式化,因此它们与HTML文档中的其他信息交织在一起。因此,他们的无监督提取是一个繁重的问题-考虑到不同用户提取嵌套答案时,问题甚至更大。本文提出了一种用于无监督用户评论提取的新技术(CommentsMiner)。我们的方法使用了频繁子树挖掘的理论框架和数据提取技术。我们证明了评论挖掘任务可以建模为紧随其后的学习排名问题的约束封闭诱导子树挖掘问题。我们的实验评估表明,CommentMiner解决了84%的代表性和可访问数据集的纯注释和嵌套注释提取问题,同时性能优于现有基准技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号