首页> 美国政府科技报告 >Deep PDF Parsing to Extract Features for Detecting Embedded Malware

【24h】

Deep PDF Parsing to Extract Features for Detecting Embedded Malware

机译：深度pDF解析以提取检测嵌入式恶意软件的功能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The number of PDF files with embedded malicious code has risen significantly in the past few years. This is due to the portability of the file format, the ways Adobe Reader recovers from corrupt PDF files, the addition of many multimedia and scripting extensions to the file format, and many format properties the malware author may use to disguise the presence of malware. Current research focuses on executable, MS Office, and HTML formats. In this paper, several features and properties of PDF Files are identified. Features are extracted using an instrumented open source PDF viewer. The feature descriptions of benign and malicious PDFs can be used to construct a machine learning model for detecting possible malware in future PDF files. The detection rate of PDF malware by current antivirus software is very low. A PDF file is easy to edit and manipulate because it is a text format, providing a low barrier to malware authors. Analyzing PDF files for malware is nonetheless difficult because of (a) the complexity of the formatting language, (b) the parsing idiosyncrasies in Adobe Reader, and (c) undocumented correction techniques employed in Adobe Reader. In May 2011, Esparza demonstrated that PDF malware could be hidden from 42 of 43 antivirus packages by combining multiple obfuscation techniques. One reason current antivirus software fails is the ease of varying byte sequences in PDF malware, thereby rendering conventional signature-based virus detection useless. The compression and encryption functions produce sequences of bytes that are each functions of multiple input bytes. As a result, padding the malware payload with some whitespace before compression/encryption can change many of the bytes in the final payload. In this study we analyzed a corpus of 2591 benign and 87 malicious PDF files. While this corpus is admittedly small, it allowed us to test a system for collecting indicators of embedded PDF malware. We will call these indicators features throughout the rest of this report. The features are extracted using an instrumented PDF viewer, and are the inputs to a prediction model that scores the likelihood of a PDF file containing malware. The prediction model is constructed from a sample of labeled data by a machine learning algorithm (specifically, decision tree ensemble learning). Preliminary experiments show that the model is able to detect half of the PDF malware in the corpus with zero false alarms. We conclude the report with suggestions for extending this work to detect a greater variety of PDF malware.

著录项

作者
Munson, M. A.; Cross, J. S.;
展开▼
作者单位

展开▼
年度 2011
页码 1-20
总页数 20
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Malicious codes ; Computer programming ; Security ; Algorithms ; Compression ; Parsing ; Feature extraction ; Detection ; Learning ; Decision tree analysis ; Computer secrurity ; Computer codes;

机译：恶意代码;计算机编程;安全;算法;压缩;解析;特征提取;检测;学习;决策树分析;计算机安全;计算机代码;

相似文献

外文文献
中文文献
专利

1. Digital Investigation of PDF Files: Unveiling Traces of Embedded Malware [J] . Maiorca Davide, Biggio Battista IEEE security & privacy . 2019,第1期

机译：PDF文件的数字调查：揭示嵌入式恶意软件的痕迹
2. Digital Investigation of PDF Files: Unveiling Traces of Embedded Malware [J] . Maiorca Davide, Biggio Battista IEEE security & privacy . 2019,第1期

机译：PDF文件的数字调查：揭示嵌入式恶意软件的痕迹
3. A feature-vector generative adversarial network for evading PDF malware classifiers [J] . Information Sciences: An International Journal . 2020,第期

机译：一种用于逃避PDF恶意软件分类器的特征 - 矢量生成妇女网络
4. A methodology to detect and extract tables from born-digital PDF documents using deep learning [C] . C. Shichin, A.C. Vinay Chandran, V.S. Unnikrishnan International Conference on Materials, Mechanics and Management . 2019

机译：使用深度学习从出生的数字PDF文件中检测和提取表的方法
5. Detecting Stealthy Malware Using Behavioral Features in Network Traffic. [D] . Yen, Ting-Fang. 2011

机译：使用网络流量中的行为功能检测隐身恶意软件。
6. Deep Feature Extraction and Classification of Android Malware Images [O] . Jaiteg Singh, Deepak Thakur, Farman Ali, 2020

机译：安卓恶意软件图像的深度特色提取和分类
7. Deep PDF parsing to extract features for detecting embedded malware. [O] . Munson, Miles Arthur, Cross, Jesse S. (Missouri University of Science and Technology, Rolla, MO) 2011

机译：深度pDF解析以提取用于检测嵌入式恶意软件的功能。

Deep PDF Parsing to Extract Features for Detecting Embedded Malware

摘要

著录项

相似文献

相关主题

期刊订阅