首页> 外文会议>IEEE Applied Imagery Pattern Recognition Workshop >Automated variability selection in time-domain imaging surveys using sparse representations with learned dictionaries
【24h】

Automated variability selection in time-domain imaging surveys using sparse representations with learned dictionaries

机译:使用稀疏表示与学习词典的时域成像调查中的自动变化选择

获取原文

摘要

Exponential growth in data streams and discovery power delivered by modern time-domain imaging surveys creates a pressing need for variability extraction algorithms that are both fully automated and highly reliable. The current state of the art methods based on image differencing are limited by the fact that for every real variable source the algorithm returns a large number of bogus “detections” caused by atmospheric effects and instrumental signatures coupled with imperfect image processing. Here we present a new approach to this problem inspired by recent advances in computer vision and train the machine to learn new features directly from pixel data. The training data set comes from the Palomar Transient Factory survey and consists of small images centered around transient candidates with known real/bogus classification. This set of high-dimensional vectors (~1000 features) is then transformed into a linear representation using the so called dictionary, an overcomplete feature set constructed separately for each class. The data vectors are well approximated with a small number of dictionary elements, i.e. the dictionary representation is sparse. We show how sparse representations can be used to construct informative features for any suitable machine learning classifier. Our top level classifier is based on the random forest algorithm (collections of decision trees) with input data vectors consisting of up to 6 computer vision features and 20 additional context features designed by subject domain experts. Machine-learned features alone provide only an approximate classification with a 20% missed detection rate at a fixed false positive rate of 1%. When automatically extracted features are appended to those constructed by humans, the rate of missed detections is reduced from 8% to about 4% at 1% false positive rate.
机译:通过现代时域成像调查提供的数据流和发现电力的指数增长为可变性提取算法进行了全自动和高度可靠的可变性提取算法。基于图像差异的现有技术的当前状态受到限制,因为对于每个真实的可变源,算法返回由大气效应和仪器签名引起的大量虚伪“检测”,耦合与不完美的图像处理。在这里,我们提出了一种新的方法,通过电脑愿景的最近进步和火车直接从像素数据学习新功能的新方法。培训数据集来自Palomar瞬态工厂调查,包括以临时候选人为中心的小图像组成,具有已知的Real / Bogus分类。然后使用所谓的字典将该组高维向量(〜1000个特征)转换为线性表示,为每个类单独构造的过度顺序特征集。数据矢量与少量字典元素近似近似,即字典表示稀疏。我们展示了如何使用稀疏表示来构建任何合适的机器学习分类器的信息功能。我们的顶级分类器基于随机森林算法(决策树集合),其中输入数据矢量包括最多6个计算机视觉功能和由主题领域专家设计的20个附加上下文功能。单独的机器学习特征仅提供近似分类,以固定的假阳性率为1%的固定误报率为20%。当自动提取的特征附加到人类构建的那些时,错过的检测速率从8%降低至1%的假阳性率为1%至约4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号