The probability of a computer file being malware is inferred by iteratively propagating domain knowledge among computer files, related clients, and/or related source domains. A graph is generated to include machine nodes representing clients, file nodes representing files residing on the clients, and optionally domain nodes representing source domains hosting the files. The graph also includes edges connecting the machine nodes with the related file nodes, and optionally edges connecting the domain nodes with the related file nodes. Priors and edge potentials are set for the nodes and the edges based on related domain knowledge. The domain knowledge is iteratively propagated and aggregated among the connected nodes through exchanging messages among the connected nodes. The iteration process ends when a stopping criterion is met. The classification and associated marginal probability for each file node are calculated based on the priors, the received messages, and the edge potentials associated with the edges through which the messages were received.
展开▼