首页> 外文期刊>KI - Künstliche Intelligenz >From Texts to Networks: Detecting and Managing the Impact of Methodological Choices for Extracting Network Data from Text Data
【24h】

From Texts to Networks: Detecting and Managing the Impact of Methodological Choices for Extracting Network Data from Text Data

机译:从文本到网络:检测和管理从文本数据中提取网络数据的方法选择的影响

获取原文
获取原文并翻译 | 示例
           

摘要

This thesis (Diesner in Technical Report CMU-ISR-12-101, 2012) addresses a series of methodological problems related to extracting information on socio-technical networks from natural language text data. Theories and models from the social sciences are leveraged and combined with computational approaches to (a) construct, analyze and compare network data and (b) combine text data and network data for analysis. This thesis entails various projects that serve three purposes: First, the impact of various common coding choices, including reference resolution and co-occurrence-based link formation, on network data and analysis results is empirically identified across multiple types of text data and domains. Second, different relation extraction methods are compared across various over-time, open-source, large-scale datasets with respect to the resulting network data and analysis results. This study offers a complement to traditional strategies for accuracy assessment. The relation extraction methods considered include network data construction based on (a) manually versus automatically built thesauri, (b) meta-data, and (c) collaboration with subject matter experts. Third, the concepts of grouping and roles from network analysis are integrated with text mining methods to enable the theoretically grounded, joint consideration of text data and network data for real-world applications.
机译:本论文(Diesner,技术报告CMU-ISR-12-101,2012年)解决了与从自然语言文本数据中提取社会技术网络信息有关的一系列方法论问题。利用社会科学的理论和模型并将其与计算方法相结合,以(a)构建,分析和比较网络数据,以及(b)组合文本数据和网络数据以进行分析。本文涉及多个项目,这些项目具有三个目的:首先,通过多种文本数据和域的经验确定了各种常见编码选择(包括参考分辨率和基于共现的链接形成)对网络数据和分析结果的影响。其次,针对所得的网络数据和分析结果,跨各种超时,开源,大规模数据集比较了不同的关系提取方法。这项研究为准确性评估的传统策略提供了补充。考虑的关系提取方法包括基于(a)手动与自动构建叙词表的网络数据构建,(b)元数据和(c)与主题专家的协作。第三,将网络分析中的分组和角色的概念与文本挖掘方法集成在一起,以便在理论上将文本数据和网络数据进行理论上的联合考虑,以用于实际应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号