Embodiments described herein relate generally to a methodology of efficient object classification within a visual medium. The methodology utilizes a first neural network to perform an attention based object localization within a visual medium to generate a visual mask. The visual mask is applied to the visual medium to generate a masked visual medium. The masked visual medium may be then fed into a second neural network to detect and classify objects within the visual medium.
展开▼