Identification of Tweets that Mention Books: An Experimental Comparison of Machine Learning Methods

机译：提及图书的推文的识别：机器学习方法的实验比较

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we address the task of the identification of tweets on Twitter that mention books (TMB) among tweets that contain the same strings as full book titles. Although this task can be treated as a kind of Named Entity Recognition, the fact that book titles consist of ordinary expressions (such as "The Girl on the Train") makes the task harder. Furthermore, if tweets are gathered through a dictionary-based search, the tweets that contain the same strings as full book titles are often spam. However, assuming a complete list of book titles (i.e. from a union catalogue from a library or commercial bibliographic data from a book store), this task can be solved by text classification. Thus, we proposed a two-step pipeline consisting of spam filtering and TMB classification based on supervised learning with a small amount of labelled data. We constructed optimal classifiers by comparing combinations of four proven supervised learning methods with different features. Given the difficulty of the task, our pipeline performed highly (about 0.7 in terms of F-score).

机译：在本文中，我们解决了在Twitter上标识包含与完整书名相同的字符串的推文中提及书籍（TMB）的推文的任务。尽管可以将此任务视为一种命名实体识别，但是书名由普通表达（例如“火车上的女孩”）组成的事实使该任务更加困难。此外，如果通过基于字典的搜索收集推文，则包含与完整书名相同的字符串的推文通常是垃圾邮件。但是，假设书名的完整列表（即来自图书馆的联合目录或来自书店的商业书目数据），则可以通过文本分类来解决此任务。因此，我们提出了一个基于垃圾邮件过滤和TMB分类的两步式管道，该管道基于监督学习和少量标记数据。通过比较四种经过验证的具有不同功能的监督学习方法的组合，我们构建了最佳分类器。考虑到任务的难度，我们的管道运行良好（以F分数计约为0.7）。

著录项

来源
《International conference on Asian-Pacific digital libraries》|2015年|278-288|共11页
会议地点
作者
Shuntaro Yada; Kyo Kageura;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Machine Learning; Japanese text classification; Named entity recognition on twitter; Book title identification;

机译：机器学习;日语文字分类; Twitter上的命名实体识别;书名识别;

相似文献

外文文献
中文文献
专利

1. Identification of tweets that mention books [J] . Shuntaro Yada, Kyo Kageura, Cecile Paris International journal on digital libraries . 2020,第3期

机译：确定提到书籍的推文
2. Evaluation of machine learning methods for formation lithology identification: A comparison of tuning processes and model performances [J] . Xie Yunxin, Zhu Chenyang, Zhou Wen, Journal of Petroleum Science & Engineering . 2018,第期

机译：用于形成岩性识别的机器学习方法的评价：调谐过程和模型性能的比较
3. Comparison of Activity Type Identification from Mobile Phone GPS Data Using Various Machine Learning Methods [J] . Lei GONG, Toshiyuki YAMAMOTO, Takayuki MORIKAWA Asian Transport Studies . 2016,第1期

机译：使用各种机器学习方法从手机GPS数据中识别活动类型的比较
4. Identification of Tweets that Mention Books: An Experimental Comparison of Machine Learning Methods [C] . Shuntaro Yada, Kyo Kageura Asian Digitalibrary Conference . 2015

机译：识别提到书籍的推文：机器学习方法的实验比较
5. The Use of Machine Learning Method for Modeling and Analyzing Pedestrian Crash Data and Comparisons with Traditional Discrete Choice Methods [D] . Li, Yang. 2020

机译：使用机器学习方法来建模和分析行人碰撞数据和传统离散选择方法的比较
6. Identification of DEP domain-containing proteins by a machine learning method and experimental analysis of their expression in human HCC tissues [O] . Zhijun Liao, Xinrui Wang, Yeting Zeng, -1

机译：通过机器学习方法鉴定含DEP结构域的蛋白及其在人HCC组织中表达的实验分析
7. Using Machine Learning and Deep Learning Methods to Find Mentions of Adverse Drug Reactions in Social Media [O] . Pilar López Úbeda, Manuel Carlos Díaz Galiano, Maite Martin, 2019

机译：利用机器学习和深度学习方法，了解社交媒体中不良药物反应的提升

Identification of Tweets that Mention Books: An Experimental Comparison of Machine Learning Methods

摘要

著录项

相似文献

相关主题

期刊订阅