首页> 外文学位 >Knowledge discovery by fusion of information.
【24h】

Knowledge discovery by fusion of information.

机译:通过信息融合来发现知识。

获取原文
获取原文并翻译 | 示例

摘要

Information fusion aims to develop intelligent approaches of integrating information from different complementary sources, so that a more comprehensive basis is obtained for data analysis and knowledge discovery. These approaches are particularly interesting in the areas of geo-informatics and text mining in bioinformatics, where huge size of high-dimensional data from multiply correlated sources are available while their inherent relationships have not been well understood. In the first part of the thesis, we study the data-driven approaches to improve aerosol optical thickness (AOT) retrieval performance from satellite-based and ground-based observations. We explore the statistical models which complement deterministic retrieval algorithms to reduce computational costs on huge size of remote sensing data. The experiments showed that, given a small fractions of deterministic retrievals for training, statistical models can significantly speed up AOT retrievals while introducing only a slight accuracy overhead. Next, in order to construct an accurate and understandable AOT predictor, we combine spatially and temporally co-located data from multi-sources into a uniform dataset. Artificial neural networks (ANNs) are applied to derive optimal regression models. The results suggest ANN models achieve overall accuracy superior or comparable to deterministic AOT retrievals. The decision tree analysis reveals ANN predictions effectively enhance deterministic AOT retrievals in some surface or climate conditions. The second part of the thesis addresses the problems of identifying biomedical publications with desired experimental evidence from MEDLINE, a major biomedical repository collecting millions of domain papers from different journals and conference proceedings. The learning task is challenged by richness of biomedical terminology sources, diversity of experimental evidence expressions and a small list of labeled examples for training. We propose a novel substring construction algorithm which derives attributes from semantically-related terms with shared stems or morphemes. With five post-translational modification (PTM) test datasets, curators confirm the selected substrings significantly improve classification performance. Finally, we summarize our work and propose future research directions. Specifically, we describe a framework which exploits online text to explain some aerosol results of satellite image analysis.
机译:信息融合旨在开发整合来自不同互补资源的信息的智能方法,从而为数据分析和知识发现获得更全面的基础。这些方法在地理信息学和生物信息学中的文本挖掘领域特别有趣,在这些领域中,来自多重相关源的大量高维数据可用,而它们的内在联系尚未得到很好的理解。在本文的第一部分中,我们研究了基于数据的方法,旨在通过基于卫星和基于地面的观测来改善气溶胶光学厚度(AOT)的检索性能。我们探索补充确定性检索算法的统计模型,以减少巨大规模的遥感数据的计算成本。实验表明,给定一小部分确定性检索用于训练,统计模型可以显着加快AOT检索的速度,同时仅引入少量的准确性开销。接下来,为了构建准确且易于理解的AOT预测器,我们将来自多个源的时空共处数据组合成统一的数据集。人工神经网络(ANN)用于得出最佳回归模型。结果表明,人工神经网络模型获得的总体准确性优于或可比确定性AOT检索。决策树分析表明,在某些地表或气候条件下,人工神经网络的预测有效地增强了确定性AOT的检索。论文的第二部分解决了从MEDLINE获得鉴定具有所需实验证据的生物医学出版物的问题,MEDLINE是一个主要的生物医学资源库,收集了来自不同期刊和会议论文集的数百万篇领域论文。丰富的生物医学术语资源,实验证据表达的多样性以及一小部分带有标签的培训样本,对学习任务提出了挑战。我们提出了一种新颖的子字符串构造算法,该算法从具有共享词干或词素的语义相关术语中获取属性。通过五个翻译后修饰(PTM)测试数据集,策展人确认所选的子字符串显着提高了分类性能。最后,我们总结我们的工作并提出未来的研究方向。具体来说,我们描述了一个利用在线文本来解释卫星图像分析的一些气溶胶结果的框架。

著录项

  • 作者

    Han, Bo.;

  • 作者单位

    Temple University.;

  • 授予单位 Temple University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 118 p.
  • 总页数 118
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号