Malware analysts, besides being tasked to create signatures, are also called upon to generate indicators of compromise, to disrupt botnets, to attribute an attack to an actor, and to understand the adversary's intent. This requires extracting from malware a variety of secrets, aka threat intelligence. After studying a few samples from a malware family and locating where its secrets are embedded, analysts create rules that may be used to automatically extract threat intelligence from malware variants in the future. Rules to extract secrets from malware are today written as regular expressions over bytecodes, such as using Yara. These rules are easily invalidated by polymorphic variants or evolutionary versions. Keeping the rules updated is a maintenance challenge for malware analysts. Instead of using bytecode, we present the use of code semantics to create rules to extract malware secrets. The semantics of code captures the effect of instructions on the registers and memory. Rules written using the structure of the symbolic content of registers and memory, instead of bytecode, are more resilient to code transformation and evolutionary changes, and are thus less brittle and easier to maintain.
展开▼