Implementations generally relate to surgical scene assessment based on computer vision. In some implementations, a method includes receiving a first image frame of a plurality of image frames associated with a surgical scene. The method further includes detecting one or more objects in the first image frame. The method further includes determining one or more positions corresponding to the one or more objects. The method further includes tracking each position of the one or more objects in other image frames of the plurality of image frames.
展开▼