Predicting categories of news articles using meta-data from the Web

机译：使用来自Web的元数据预测新闻报道的类别

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text mining, a field of machine learning that deals with the discovery of knowledge from text, is evolving rapidly. This fact has been recognized by the Artificial Intelligence Laboratory of Jožef Stefan Institute, which is developing a system called Event Registry that collects news articles from the Web in real-time, detects events therein and extracts relevant information. The component of the system which deals with the classification of articles into categories has not yet been fully developed. In a response to this, in our diploma thesis, we tried to upgrade a reference model. The results of our work have been positive, since we improved the predictive accuracy of classification of arbitrary news articles into one of the categories of our predefined taxonomy. During the learning phase, we examined the impact of various forms of meta-data on the predictive accuracy of the model, where we focused mainly on meta-data obtained from Never-Ending Language Learner developed at Carnegie Mellon University. We assessed that the latter have a positive effect on the performance of the model if they are used in combination with other meta-data. For the purposes of learning we used different algorithms such as logistic regression, support vector machine, random forests and k-nearest neighbors. It turned out that the first two algorithms are the most appropriate for building the optimal predictive model. At the same time, we also tested several approaches to active learning, by which we can simplify and speed up the process of manual labeling of new articles. All of them have produced a positive result, while approach that combines uncertainty of prediction with correlation between learning instances proved to be the best.

机译：文本挖掘是机器学习的一个领域，它致力于处理来自文本的知识发现，并且发展迅速。 JožefStefan研究所的人工智能实验室已经意识到了这一事实，该实验室正在开发一个名为Event Registry的系统，该系统可以实时从Web收集新闻报道，检测其中的事件并提取相关信息。用于将物品分类到类别中的系统组件尚未完全开发。为此，我们在毕业论文中尝试升级参考模型。我们的工作取得了积极的成果，因为我们提高了将任意新闻分类为预定分类法之一的预测准确性。在学习阶段，我们检查了各种形式的元数据对模型预测准确性的影响，我们主要集中于从卡内基梅隆大学开发的永无止境的语言学习者那里获得的元数据。我们评估了如果将后者与其他元数据结合使用，则后者对模型的性能具有积极影响。为了学习的目的，我们使用了不同的算法，例如逻辑回归，支持向量机，随机森林和k最近邻。事实证明，前两种算法最适合构建最佳预测模型。同时，我们还测试了几种主动学习的方法，通过这些方法，我们可以简化和加快新文章的手动标记过程。所有这些都产生了积极的结果，而将预测的不确定性与学习实例之间的相关性相结合的方法被证明是最好的。

著录项

作者
Vučko Žiga;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Combining Online News Articles and Web Search to Predict the Fluctuation of Real Estate Market in Big Data Context [J] . Daoyuan SunYudie Du, Wei Xu, Mei Yun Zuo, Pacific Asia journal of the Association for Information Systems . 2015,第4期

机译：结合在线新闻文章和网络搜索来预测大数据环境下的房地产市场波动
2. Hoax news-inspector: a real-time prediction of fake news using content resemblance over web search results for authenticating the credibility of news articles [J] . Varshney Deepika, Vishwakarma Dinesh Kumar Journal of ambient intelligence and humanized computing . 2021,第9期

机译：骗局新闻检查员：使用内容相似在Web搜索结果上使用内容相似进行假新闻的实时预测，以验证新闻文章的可信度
3. Web video categorization using category-predictive classifiers and category-specific concept classifiers [J] . Afzal Mehtab, Wu Xiao, Chen Honghan, Neurocomputing . 2016,第nova19期

机译：使用类别预测分类器和特定于类别的概念分类器对Web视频进行分类
4. Predicting Stock Price Movements Based on Different Categories of News Articles [C] . Yauheniya Shynkevich, T. M. McGinnity, Sonya Coleman, IEEE Symposium Series on Computational Intelligence . 2015

机译：根据新闻文章的不同类别预测股价走势
5. A framing analysis of online newspaper articles and weblog articles. [D] . Janssen, Maria Carolina Gabriele. 2010

机译：对在线报纸文章和博客文章的框架分析。
6. Response to a news article on the RCGP website [O] . Faraz Razi 2017

机译：对RCGP网站上的新闻报道的回应
7. Combining Online News Articles and Web Search to Predict the Fluctuation of Real Estate Market in Big Data Context [O] . Daoyuan Sun, Yudie Du, Wei Xu, 2014

机译：结合在线新闻文章和网络搜索，预测大数据背景下房地产市场的波动

Predicting categories of news articles using meta-data from the Web

摘要

著录项

相似文献

相关主题

期刊订阅