The analysis of videos for the recognition of Instrumental Activities of Daily Living (IADL) through the detection ofobjects and the context analysis, applied for the evaluation of patient’s capacity with Alzheimer's disease and age relateddementia, has recently gained a lot of interest. The incorporation of human perception in the recognition tasks, search,detection and visual content understanding has become one of the main tools for the development of systems andtechnologies that support the performance of people in their daily life activities. In this paper we propose a model ofautomatic segmentation of the saliency region where the objects of interest are found in egocentric video using fullyconvolutional networks (FCN). The segmentation is performed with the information regarding to human perception,obtaining a better segmentation at pixel level. This segmentation involves objects of interest and the salient region inegocentric videos, providing precise information to detection systems and automatic indexing of objects in video, wherethese systems have improved their performance in the recognition of IADL. To measure models segmentation performanceof the salient region, we benchmark two databases; first, Georgia-Tech-Egocentric-Activity database and second, our owndatabase. Results show that the method achieves a significantly better performance in the precision of the semanticsegmentation of the region where the objects of interest are located, compared with GBVS (Graph-Based Visual Saliency)method.
展开▼