Leveraging topic models to develop metrics for evaluating the quality of narrative threads extracted from news stories

机译：利用主题模型开发评估从新闻故事中提取的叙事线程质量的指标

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Analysts and software systems are increasingly tasked with making sense of a growing amount of data to help their organizations make decisions involving risk and uncertainty. A key enabler of this work is the ability to quickly discover structure in large amounts of text such as news stories and blogs. Recent work in this area has shown it is possible to automatically link documents from a corpus together to build a narrative structure, called a story chain, without the need for prior domain knowledge [1]. This approach is an unsupervised method that discovers large numbers of story chains of variable quality. In this paper, we describe and evaluate methods to identify the most coherent and informative story chains. We explore two types of topic model based analytics. The first type is a measure of representativeness that captures how well a story chain represents the corpus from which it was generated. This is done by comparing the similarity of topics found over time in a story chain against those expressed in the corpus during the same time period. Our hypothesis is that story chains that have similar topic expression to the corpus will convey narratives that are central to the corpus. This type of analytic could help an analyst quickly focus on the key narratives in a large corpus of documents. The second type is a measure of quality of a story chain and is composed of topic consistency and topic persistence measures. Our hypothesis is that high quality chains would be composed of sequences of stories that have clearly defined primary topics that persist across significant portions of the story chain. We used these analytics to predict the clarity of story chains within one of four categories (1) very clear narrative, 2) somewhat clear narrative, 3) somewhat unclear narrative, 4) very unclear narrative, and found we were able to train a data model to label story chains with the same label as human coders 77% of the time. Our dataset was composed of 7,074 English language news stories released during the Brazil Protests of 2013 from which 5,606 story chains were generated. We randomly selected 60 story chains for hand scoring to serve as our gold standard data set for experimentation.

机译：分析师和软件系统越来越多地任务，了解越来越多的数据，以帮助他们的组织做出涉及风险和不确定性的决定。这项工作的关键推动者是能够在大量文本中快速发现结构，例如新闻报道和博客。该领域的最新工作表明，可以自动将文档从语料库中联系在一起，以构建一个叙述性结构，称为故事链，而无需现有域知识[1]。这种方法是一种无监督的方法，可以发现大量的可变质量的故事链。在本文中，我们描述并评估了识别最连贯和信息性故事链的方法。我们探索两种类型的主题模型分析。第一种类型是一种代表性的量度，捕获故事链的代表物质的核心率。这是通过比较在同一时间段内对故事链中的时间内发现的主题的相似性进行比较来完成的。我们的假设是对语料库具有类似主题表达的故事链将传达对语料库的核心。这种类型的分析可以帮助分析师快速关注大型文件语料库中的关键叙述。第二种类型是故事链质量的衡量标准，由主题一致性和主题持久度措施组成。我们的假设是高质量的链将由清楚地定义了故事链的重要部分的主要主题的故事序列组成。我们利用这些分析来预测四个类别中的一个（1）非常清晰的叙述中的一个故事链的清晰度，2）有些清晰的叙述，3）有些不明确的叙述，4）叙述非常不明确，发现我们能够培训数据模型将故事链标记为与人类编码者相同的标签77％的时间。我们的数据集由2013年巴西抗议活动中发布的7,074名英语新闻故事组成，从中生成5,606个故事链。我们随机选择了60个故事链，以便手动得分作为我们的黄金标准数据集进行实验。

著录项

来源
《International Conference on Applied Human Factors and Ergonomics》|2015年|3317-4151p|共8页
会议地点
作者
Jason Schlachter; Alicia Ruvinsky; Luis Asencios Reynoso; Sathappan Muthiah; Naren Ramakrishnan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TB18-53;
关键词
Sensemaking; Data analytics; Text analytics; Narrative; Machine learning; Topic modeling;

机译：感觉;数据分析;文本分析;叙事;机器学习;主题建模;

相似文献

外文文献
中文文献
专利

1. Leveraging Topic Models to Develop Metrics for Evaluating the Quality of Narrative Threads Extracted from News Stories [J] . Jason Schlachter, Alicia Ruvinsky, Luis Asencios Reynoso, Procedia Manufacturing . 2015,第4期

机译：利用主题模型开发度量标准，以评估从新闻故事中提取的叙事线索的质量
2. Story Segmentation and Topic Classification of Broadcast News via a Topic-Based Segmental Model and a Genetic Algorithm [J] . Wu C.-H., Hsieh C.-H. Audio, Speech, and Language Processing, IEEE Transactions on . 2009,第8期

机译：通过基于主题的细分模型和遗传算法对广播新闻进行故事细分和主题分类
3. Metric-based data quality assessment - Developing and evaluating a probability-based currency metric [J] . Bernd Heinrich, Mathias Klier Decision support systems . 2015,第apra期

机译：基于度量的数据质量评估-开发和评估基于概率的货币度量
4. Leveraging topic models to develop metrics for evaluating the quality of narrative threads extracted from news stories [C] . Jason Schlachter, Alicia Ruvinsky, Luis Asencios Reynoso, International Conference on Applied Human Factors and Ergonomics . 2015

机译：利用主题模型开发评估从新闻故事中提取的叙事线程质量的指标
5. Developing Quantitative Validation Metrics to Assess Quality of Computational Mechanics Models Relative to Reality [D] . Dvurecenska, Ksenija. 2019

机译：开发定量验证指标，以评估计算力学模型的质量相对于现实
6. The Story as a Quality Instrument: Developing an Instrument for Quality Improvement Based on Narratives of Older Adults Receiving Long-Term Care [O] . Aukelien Scheffelaar, Meriam Janssen, Katrien Luijkx 2021

机译：该故事作为优质仪器：开发基于接受长期护理的老年人叙述的质量改进仪器
7. Leveraging Topic Models to Develop Metrics for Evaluating the Quality of Narrative Threads Extracted from News Stories [O] . Schlachter Jason, Ruvinsky Alicia, Reynoso Luis Asencios, 2015

机译：利用主题模型来开发度量标准，以评估从新闻故事中提取的叙事线索的质量

Leveraging topic models to develop metrics for evaluating the quality of narrative threads extracted from news stories

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅