Android markets have grown both in size and diversity, offering apps that are localized or curated for specific use cases. It is not uncommon for users to be unaware of the exact app version or name they should be installing. This has given rise to the threat of app cloning where adversaries copy the package of an app, minimally modify its code, and redistribute the clone on the market to gain a monetary advantage or to distribute malicious payloads. Existing clone detection methods use static signatures that can be evaded using control-and data-flow obfuscation. Moreover, many approaches do not scale with the number of apps, code size, and complexity, leading to prohibitive detection time requirements. In this paper, we introduce Dexsim, a dynamic analysis based system to accurately index apps and identify bytecode similarities. We propose a novel bytecode indexing and matching algorithm that employs concepts from forced execution and LZ78 compression trees, and scales linearly with the number and size of apps. Our experiments on 28k cloned benign and malicious apps showed that Dexsim is both scalable and resilient to obfuscation, ferreting out clones within 8 ms pair-wise on average with at least 90% accuracy.
展开▼