首页> 外文会议>IEEE Symposium on Security and Privacy >How to Learn Klingon without a Dictionary: Detection and Measurement of Black Keywords Used by the Underground Economy
【24h】

How to Learn Klingon without a Dictionary: Detection and Measurement of Black Keywords Used by the Underground Economy

机译:如何在没有字典的情况下学习克林贡语:地下经济使用的黑色关键词的检测和测量

获取原文

摘要

Online underground economy is an important channel that connects the merchants of illegal products and their buyers, which is also constantly monitored by legal authorities. As one common way for evasion, the merchants and buyers together create a vocabulary of jargons (called "black keywords" in this paper) to disguise the transaction (e.g., "smack" is one street name for "heroin" [1]). Black keywords are often "unfriendly" to the outsiders, which are created by either distorting the original meaning of common words or tweaking other black keywords. Understanding black keywords is of great importance to track and disrupt the underground economy, but it is also prohibitively difficult: the investigators have to infiltrate the inner circle of criminals to learn their meanings, a task both risky and time-consuming. In this paper, we make the first attempt towards capturing and understanding the ever-changing black keywords. We investigated the underground business promoted through blackhat SEO (search engine optimization) and demonstrate that the black keywords targeted by the SEOers can be discovered through a fully automated approach. Our insights are two-fold: first, the pages indexed under black keywords are more likely to contain malicious or fraudulent content (e.g., SEO pages) and alarmed by off-the-shelf detectors, second, people tend to query multiple similar black keywords to find the merchandise. Therefore, we could infer whether a search keyword is "black" by inspecting the associated search results and then use the related search queries to extend our findings. To this end, we built a system called KDES (Keywords Detection and Expansion System), and applied it to the search results of Baidu, China's top search engine. So far, we have already identified 478,879 black keywords which were clustered under 1,522 core words based on text similarity. We further extracted the information like emails, mobile phone numbers and instant messenger IDs from the pages and domains relevant to the underground business. Such information helps us gain better understanding about the underground economy of China in particular. In addition, our work could help search engine vendors purify the search results and disrupt the channel of the underground market. Our co-authors from Baidu compared our results with their blacklist, found many of them (e.g., long-tail and obfuscated keywords) were not in it, and then added them to Baidu's internal blacklist.
机译:在线地下经济是连接非法产品及其买家的商家的重要渠道,该商人也不断受到法律当局监督的重要渠道。作为逃避的常见方式,商家和买家共同创造了一个术语的词汇(称为“黑色关键词”在本文中)伪装交易(例如,“Smack”是“海洛因”的一个街道名称。黑色关键字通常“不友好”到外人,它是通过扭曲常用词的原始含义或调整其他黑色关键字的原始含义而创建的。了解黑色关键词是非常重视跟踪和扰乱地下经济,但它也难以困难:调查人员必须渗透犯罪分子的内圈,以了解他们的意义,这是一个冒险和耗时的任务。在本文中,我们首次尝试捕获和理解不断变化的黑色关键字。我们调查了通过Blackhat SEO(搜索引擎优化)宣传的地下业务,并证明可以通过全自动方法发现SEAERS的黑色关键字。我们的见解是两倍:首先,在黑色关键字下索引的页面更有可能包含恶意或欺诈内容(例如,SEO页面)并由现成的探测器,第二,人们倾向于查询多个类似的黑色关键字找到商品。因此,我们可以通过检查相关搜索结果来推断搜索关键字是“黑色”,然后使用相关搜索查询来扩展我们的发现。为此,我们建立了一个名为KDES(关键字检测和扩展系统)的系统,并将其应用于中国顶级搜索引擎百度的搜索结果。到目前为止,我们已经确定了478,879个黑色关键字,基于文本相似性在1,522个核心单词下聚集在一起。我们进一步从与地下业务相关的页面和域中提取了电子邮件,手机号码和即时消息ID等信息。这些信息有助于我们更好地了解中国地下经济。此外,我们的工作可以帮助搜索引擎供应商净化搜索结果并扰乱地下市场的渠道。来自百度的我们的共同作者与他们的黑名单相比,我们的黑名单发现了许多(例如,长尾和混淆的关键字)并不在其中,然后将它们添加到百度的内部黑名单。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号