A methodology of machine learning in automated entity summarization.

机译：自动实体摘要中的机器学习方法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Conducting background research is a time consuming, yet important, part of every research endeavor. It includes compiling relevant sources, reading those sources, and comprehending the information. We find that this information scales rapidly in the current information age. The use of automated text summarization, among other techniques (e.g., search engines), helps to improve efficiency in exploring data by distilling large amounts of information that is becoming prevalent.;For the purpose of summarizing entity and topic interaction in large information stores, in this dissertation a methodology of automatic entity summarization is presented. The methodology is broken into three steps: Reading, Assembly, and Interpretation. In the Reading step, the appropriate information sources are determined and, subsequently, the interrelated entities are extracted within each source. Four inputs are necessary in this step: a topic extraction algorithm, a named entity recognition algorithm, information sources, and property information for the entities. In the Assembly step, the relationships between entities across sources is represented through knowledge networks. A trimodal weighted co-occurrence hypergraph is presented and then projected into unimodal and bimodal graphs. Finally, in the Interpretation step, graph analytics are presented to summarize the graphs. A novel diversity heuristic is derived based on information entropy to compare information diversity in different streams of literature over time.;To test the methodology, three experiments were conducted. Data from the PubMed Central Open Access Subset, which consisted of 740,418 journal citations in 4,404 journals, was downloaded on July 14, 2014. The first experiment addressed the relationship between the size of the information network and the number of files input into the methodology. It was found that a power law relationship exists, as shown in linguistic theory. The second experiment addressed the validity of the methodology in extracting meaningful connections and predicting the top chemicals using two gold standards. Results indicate that the methodology can be used to determine the top chemicals and that meaningful connections are those with the highest weight in the network. Finally, the diversity heuristic was used in the third experiment to empirically compare the diversity of information in a stream of articles relating to honeybee research to the diversity of information in a stream of articles relating to diabetes research. It was seen that the existing heuristic provides quite noisy results when applied to information networks and that the new heuristic has better asymptotic properties. This research is among the first efforts towards building improved literature-based discovery algorithms that are capable of automating the hypothesis generation process in large literature sets. iv.

机译：进行背景研究是一项耗时但重要的研究工作。它包括编译相关资源，阅读这些资源以及理解信息。我们发现，在当前信息时代，这种信息迅速扩展。除其他技术（例如搜索引擎）外，自动文本摘要的使用还有助于通过提取大量正在流行的信息来提高探索数据的效率。出于汇总大型信息存储中实体和主题交互的目的，本文提出了一种自动实体汇总的方法。该方法分为三个步骤：阅读，汇编和解释。在阅读步骤中，确定适当的信息源，然后，在每个源中提取相互关联的实体。此步骤中需要四个输入：主题提取算法，命名实体识别算法，信息源和实体的属性信息。在组装步骤中，跨来源的实体之间的关系通过知识网络表示。提出了三峰加权共现超图，然后将其投影为单峰和双峰图。最后，在“解释”步骤中，将显示图分析以汇总图。基于信息熵推导了一种新颖的多样性启发式算法，以比较不同文献流中信息多样性随时间的变化。为了测试该方法，进行了三个实验。 2014年7月14日，下载了PubMed Central Open Access子集的数据，该数据由4,404种期刊的740,418种期刊引文组成。第一个实验研究了信息网络的规模与输入该方法的文件数量之间的关系。如语言学理论所示，发现存在幂律关系。第二个实验解决了该方法在提取有意义的连接和使用两种金标准预测顶级化学品方面的有效性。结果表明，该方法可用于确定顶级化学品，有意义的连接是网络中权重最高的那些。最后，在第三项实验中使用了多样性启发法，以经验方式将与蜜蜂研究有关的文章流中的信息多样性与与糖尿病研究有关的文章流中的信息多样性进行比较。可以看出，现有的启发式方法在应用于信息网络时会提供非常嘈杂的结果，并且新的启发式方法具有更好的渐近性质。这项研究是构建基于文献的发现算法的第一步工作，该算法能够自动处理大型文献集中的假设生成过程。 iv。

著录项

作者
Chonde, Seifu.;
展开▼
作者单位

The Pennsylvania State University.;

展开▼
授予单位 The Pennsylvania State University.;
学科 Industrial engineering.;Information science.;Information technology.;Management.
学位 Ph.D.
年度 2016
页码 201 p.
总页数 201
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Predictive Analysis of First Abbreviated New Drug Application Submission for New Chemical Entities Based on Machine Learning Methodology [J] . Hu Meng, Babiskin Andrew, Wittayanukorn Saranrat, Clinical Pharmacology and Therapeutics . 2019,第1期

机译：基于机器学习方法的新化学实体第一缩写新药物申请提交的预测分析
2. A Machine Learning-based Triage methodology for automated categorization of digital media [J] . Fabio Marturana, Simone Tacconi Digital investigation . 2013,第2期

机译：一种基于机器学习的分类方法，用于数字媒体的自动分类
3. SoK: Machine vs. machine - A systematic classification of automated machine learning-based CAPTCHA solvers [J] . Antreas Dionysiou, Elias Athanasopoulos Computers & Security . 2020,第Octa期

机译：SOK：机器与机器 - 基于自动化机器学习的CAPTCHA求解器的系统分类
4. Predicting the Tear Strength of Woven Fabrics Via Automated Machine Learning: An Application of the CRISP-DM Methodology [C] . Rui Ribeiro, Andre Pilastri, Carla Moura, International Conference on Enterprise Information Systems . 2020

机译：通过自动化机器学习预测织物的撕裂强度：CRISP-DM方法的应用
5. Machine Learning and Deep Learning Based Entity Resolution Approaches for Unstructured References [D] . Li, Xinming. 2020

机译：基于机器学习和基于深度学习的非结构化参考的实体解决方法
6. Man and the machine rise to the spike‐wave. Commentary on An automated machine learning‐based detection algorithm for spike‐wave discharges (SWDs) in a mouse model of absence epilepsy. [O] . Kevin M. Kelly 2020

机译：男人和机器上升到尖峰波。缺乏癫痫鼠标模型中基于机器学习的基于机器学习的检测算法的评论。
7. Machine Learning Approach to Multi-Document Summarization. [O] . TSUTOMU HIRAO, HIDETO KAZAWA, HIDEKI ISOZAKI, 2003

机译：多文件摘要机器学习方法。

A methodology of machine learning in automated entity summarization.

摘要

著录项

相似文献

相关主题

期刊订阅