-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: OLE CF and VBA modules implemented #274
base: main
Are you sure you want to change the base?
Conversation
- Added support for parsing OLE CF and VBA (macro-enabled Office) files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work. This goes in the right direction, but I think it requires a bit of rewrite to make it more similar to other existing modules that use the nom
crate for parsing binary files. The nom
crate makes parsing complex data strucuture easier, and removes boilerplate code like the read_u32
and read_u16
functions.
impl VbaExtractor { | ||
fn new(data: &[u8]) -> Self { | ||
Self { | ||
data: data.to_vec(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a copy of data
, and data
in this case is the file being scanned. For a 1GB file we are making a copy of a 1GB buffer. There must be some of parsing this format that doesn't imply making a copy of the data.
return Err("Invalid byte order mark"); | ||
} | ||
|
||
let num_fat_sectors = Self::read_u32(self.data, 44)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these hard-coded offsets make the code harder to maintain and understand. This looks like you are parsing a struct, the nom
crate makes the parsing of structures very easy. Use the lnk
crate as an example: https://github.com/VirusTotal/yara-x/blob/main/lib/src/modules/lnk/parser.rs
@@ -0,0 +1,50 @@ | |||
/*! YARA module that parses OLE Compound File Binary Format files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include some links to documentation explaining the format.
repeated string module_names = 2; | ||
|
||
// Type of each module (standard, class, form) | ||
repeated string module_types = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using constant strings use enums.
repeated string module_types = 3; | ||
|
||
// The actual VBA code for each module | ||
repeated string module_code = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually repeated? If so, the name is a bit misleading considering that module_names
and module_types
are in plural form and this is singular.
OLE CF (Object Linking and Embedding Compound File) format is a file format used for legacy Microsoft Office files, such as documents, workbooks, presentations, and others. It's also used with Visual Basic for Applications (VBA) which is known more commonly as Office macros.
I've implemented two modules for parsing OLE CF files and VBA. I've also expanded the
dump
command to allow dumping of file metadata such as stream metadata (OLE CF) and macros (VBA).An example of how you could use this to identify the use of auto-execute macro method names like "Document_New":