In this work, we aim to address the needs of human analysts to consume and exploit data given the proliferationof overhead imaging sensors. We have investigated automatic captioning methods capable of describing andsummarizing scenes and activities by providing textual descriptions using natural language for overhead fullmotion video (FMV). We have integrated methods to provide three types of outputs: (1) summaries of shortvideo clips; (2) semantic maps, where each pixel is labeled with a semantic category; and (3) dense objectdescription to capture object attributes and activities. We show results obtained from VIRAT and Aeroscapespublicly available datasets.
展开▼