首页> 外文会议>IEEE International Conference on Pervasive Computing and Communications Workshops >Collection of a Diverse, Realistic and Annotated Dataset for Wearable Activity Recognition
【24h】

Collection of a Diverse, Realistic and Annotated Dataset for Wearable Activity Recognition

机译:用于可穿戴活动识别的多样化,现实和注释的数据集

获取原文

摘要

This paper discusses the opportunities and challenges associated with the collection of a large scale, diverse dataset for Activity Recognition. The dataset was collected by 141 undergraduate students, in a controlled environment. Students collected triaxial accelerometer data from a wearable accelerometer whilst each carrying out 3 of the 18 investigated activities, categorized into 6 scenarios of daily living. This data was subsequently labelled, anonymized and uploaded to a shared repository. This paper presents an analysis of data quality, through outlier detection and assesses the suitability of the dataset for the creation and validation of Activity Recognition models. This is achieved through the application of a range of common data driven machine learning approaches. Finally, the paper describes challenges identified during the data collection process and discusses how these could be addressed. Issues surrounding data quality, in particular, identifying and addressing poor calibration of the data were identified. Results highlight the potential of harnessing these diverse data for Activity Recognition. Based on a comparison of six classification approaches, a Random Forest provided the best classification (F-measure: 0.88). In future data collection cycles, participants will be encouraged to collect a set of “common” activities, to support generation of a larger homogeneous dataset. Future work will seek to refine the methodology further and to evaluate model on new unseen data.
机译:本文讨论了与集合大规模,不同数据集进行活动认可的机会和挑战。数据集在受控环境中由141名本科生收集。学生从可佩戴的加速度计收集三轴加速度计数据,同时每次执行18个调查活动中的3个,分为6个日常生活的方案。随后将此数据标记,匿名,并上传到共享存储库。本文介绍了数据质量,通过异常检测分析,并评估数据集的创建和验证活动识别模型的适用性。这是通过应用一系列常见数据驱动的机器学习方法来实现的。最后,本文描述了在数据收集过程中识别的挑战,并讨论了如何解决这些问题。特别地,确定了围绕数据质量的问题,特别是识别和解决数据识别不良数据。结果突出了利用这些不同数据进行活动识别的潜力。基于六种分类方法的比较,随机森林提供了最佳分类(F测量:0.88)。在未来的数据收集周期中,将鼓励参与者收集一组“共同”活动,以支持生成更大的同质数据集。未来的工作将进一步寻求改进方法,并评估新的看不见数据的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号