Forum Data Extraction without Explicit Rules

机译：论坛数据提取无明确规则

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web forum data contributed by millions of users are the mixture of well-formed user information and free-format user-generated content. Though easy to read for users, forum data are difficult to be analyzed by computer systems because of various surrounding HTML tags. It is challenging to extract forum data from a large number of Web sites automatically since these sites may have different styles. In this paper, we propose an approach to extract user information and user-generated content from multiple forum sites by using both structural and textual characteristics of forums. A structural induction process and a term combination computation process are introduced to assure extraction accuracy and automation. Extensive experiments on real-life data sets show the effectiveness of our proposed method.

机译：数百万用户贡献的Web论坛数据是良好的用户信息和自由格式用户生成内容的混合。虽然易于阅读用户，但由于各种周围的HTML标签，计算机系统难以分析论坛数据。自动从大量网站中提取论坛数据是挑战，因为这些网站可能具有不同的样式。在本文中，我们提出了一种通过使用论坛的结构和文本特征来提取来自多个论坛站点的用户信息和用户生成的内容。引入结构感应过程和术语组合计算过程以确保提取精度和自动化。实际数据集的广泛实验表明了我们提出的方法的有效性。

著录项

来源
《International Conference on Social Computing and Its Applications;International Symposium on Big Data and MapReduce;International Symposium on Privacy and Security in Cloud and Social;International Workshop on Web Wisdom;International Workshop on Society Network Analysis and Information Diffusion Modeling;International Workshop on Social Network Service on Databases》|2012年||共6页
会议地点
作者
Zhang Jingwei; Jin Cheqing; Lin Yuming; Gong Xueqing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301-53;
关键词
forum data extraction; user-generated content;

机译：论坛数据提取;用户生成的内容;

相似文献

外文文献
中文文献
专利

1. Rule extraction using Recursive-Rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset [J] . Yoichi Hayashi, Shonosuke Yukita Informatics in Medicine Unlocked . 2016,第1期

机译：使用递归规则提取算法和J48graft结合采样选择技术进行规则提取，以诊断Pima Indian数据集中的2型糖尿病
2. Use of a Recursive-Rule eXtraction algorithm with J48graft to achieve highly accurate and concise rule extraction from a large breast cancer dataset [J] . Yoichi Hayashi, Satoshi Nakano Informatics in Medicine Unlocked . 2015,第1期

机译：将递归规则提取算法与J48graft结合使用，以从大型乳腺癌数据集中实现高度准确和简洁的规则提取
3. Explicit aspects extraction in sentiment analysis using optimal rules combination [J] . Mohammad Tubishat, Norisma Idris, Mohammad Abushariah Future generation computer systems . 2021,第Jana期

机译：利用最优规则组合，明确的方面提取在情感分析中
4. Forum Data Extraction without Explicit Rules [C] . Zhang Jingwei, Jin Cheqing, Lin Yuming, The Second International Conference on Cloud and Green Computing. . 2012

机译：没有明确规则的论坛数据提取
5. Sequential pattern classification without explicit feature extraction. [D] . Lei, Hansheng. 2005

机译：顺序模式分类，无需显式特征提取。
6. Vigi4Med Scraper: A Framework for Web Forum Structured Data Extraction and Semantic Representation [O] . Bissan Audeh, Michel Beigbeder, Antoine Zimmermann, -1

机译：Vigi4Med Scraper：Web论坛结构化数据提取和语义表示的框架
7. RULE EXTRACTION FROM MEDICAL DATA WITHOUT DISCRETIZATION OF NUMERICAL ATTRIBUTES [O] . 2012

机译：从医疗数据提取没有数字属性的离散化

Forum Data Extraction without Explicit Rules

摘要

著录项

相似文献

相关主题

期刊订阅