首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Analysis of the IJCNN 2007 agnostic learning vs. prior knowledge challenge.
【24h】

Analysis of the IJCNN 2007 agnostic learning vs. prior knowledge challenge.

机译:IJCNN 2007不可知论学习与先验知识挑战的分析。

获取原文
获取原文并翻译 | 示例
           

摘要

We organized a challenge for IJCNN 2007 to assess the added value of prior domain knowledge in machine learning. Most commercial data mining programs accept data pre-formatted in the form of a table, with each example being encoded as a linear feature vector. Is it worth spending time incorporating domain knowledge in feature construction or algorithm design, or can off-the-shelf programs working directly on simple low-level features do better than skilled data analysts? To answer these questions, we formatted five datasets using two data representations. The participants in the "prior knowledge" track used the raw data, with full knowledge of the meaning of the data representation. Conversely, the participants in the "agnostic learning" track used a pre-formatted data table, with no knowledge of the identity of the features. The results indicate that black-box methods using relatively unsophisticated features work quite well and rapidly approach the best attainable performance. The winners on the prior knowledge track used feature extraction strategies yielding a large number of low-level features. Incorporating prior knowledge in the form of generic coding/smoothing methods to exploit regularities in data is beneficial, but incorporating actual domain knowledge in feature construction is very time consuming and seldom leads to significant improvements. The AL vs. PK challenge web site remains open for post-challenge submissions: http://www.agnostic.inf.ethz.ch/.
机译:我们组织了一次IJCNN 2007挑战赛,以评估机器学习中先前领域知识的附加值。大多数商业数据挖掘程序接受以表格形式预先格式化的数据,每个示例都被编码为线性特征向量。是否值得花时间在功能构建或算法设计中纳入领域知识,还是直接在简单的低级功能上运行的现成程序比熟练的数据分析人员做得更好?为了回答这些问题,我们使用两种数据表示形式格式化了五个数据集。 “先验知识”轨道中的参与者使用了原始数据,并充分了解了数据表示的含义。相反,“不可知论学习”轨道的参与者使用了预先格式化的数据表,而没有特征的身份知识。结果表明,使用相对简单的功能的黑匣子方法效果很好,并迅速达到了最佳的性能。先验知识轨道上的获胜者使用了特征提取策略,从而产生了大量的低级特征。以通用编码/平滑方法的形式并入先验知识以利用数据中的规律性是有益的,但是将实际领域知识并入特征构造中非常耗时且很少导致显着的改进。 AL vs. PK挑战网站仍然可以访问挑战后提交的文件:http://www.agnostic.inf.ethz.ch/。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号