feat: OLE CF and VBA modules implemented #274

davidmagnotti · 2024-12-23T13:09:26Z

OLE CF (Object Linking and Embedding Compound File) format is a file format used for legacy Microsoft Office files, such as documents, workbooks, presentations, and others. It's also used with Visual Basic for Applications (VBA) which is known more commonly as Office macros.

I've implemented two modules for parsing OLE CF files and VBA. I've also expanded the dump command to allow dumping of file metadata such as stream metadata (OLE CF) and macros (VBA).

An example of how you could use this to identify the use of auto-execute macro method names like "Document_New":

import "vba"

rule detect_document_new
{
    condition:
        for any module in vba.module_code : (
            module matches /document_new/i
        )
}

- Added support for parsing OLE CF and VBA (macro-enabled Office) files.

plusvic

Good work. This goes in the right direction, but I think it requires a bit of rewrite to make it more similar to other existing modules that use the nom crate for parsing binary files. The nom crate makes parsing complex data strucuture easier, and removes boilerplate code like the read_u32 and read_u16 functions.

plusvic · 2024-12-23T15:56:14Z

lib/src/modules/vba/mod.rs

+impl VbaExtractor {
+    fn new(data: &[u8]) -> Self {
+        Self {
+            data: data.to_vec(),


This makes a copy of data, and data in this case is the file being scanned. For a 1GB file we are making a copy of a 1GB buffer. There must be some of parsing this format that doesn't imply making a copy of the data.

plusvic · 2024-12-23T16:01:08Z

lib/src/modules/olecf/parser.rs

+            return Err("Invalid byte order mark");
+        }
+
+        let num_fat_sectors = Self::read_u32(self.data, 44)?;


All these hard-coded offsets make the code harder to maintain and understand. This looks like you are parsing a struct, the nom crate makes the parsing of structures very easy. Use the lnk crate as an example: https://github.com/VirusTotal/yara-x/blob/main/lib/src/modules/lnk/parser.rs

plusvic · 2024-12-23T16:04:03Z

lib/src/modules/olecf/mod.rs

@@ -0,0 +1,50 @@
+/*! YARA module that parses OLE Compound File Binary Format files.


Include some links to documentation explaining the format.

plusvic · 2024-12-24T09:15:52Z

lib/src/modules/protos/vba.proto

+  repeated string module_names = 2;
+
+  // Type of each module (standard, class, form)
+  repeated string module_types = 3;


Instead of using constant strings use enums.

plusvic · 2024-12-24T09:18:03Z

lib/src/modules/protos/vba.proto

+  repeated string module_types = 3;
+
+  // The actual VBA code for each module
+  repeated string module_code = 4;


Is this actually repeated? If so, the name is a bit misleading considering that module_names and module_types are in plural form and this is singular.

feat: OLE CF and VBA Modules Added

4545732

- Added support for parsing OLE CF and VBA (macro-enabled Office) files.

plusvic requested changes Dec 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OLE CF and VBA modules implemented #274

feat: OLE CF and VBA modules implemented #274

davidmagnotti commented Dec 23, 2024

plusvic left a comment

plusvic Dec 23, 2024

plusvic Dec 23, 2024

plusvic Dec 23, 2024

plusvic Dec 24, 2024

plusvic Dec 24, 2024

		@@ -0,0 +1,50 @@
		/*! YARA module that parses OLE Compound File Binary Format files.

feat: OLE CF and VBA modules implemented #274

Are you sure you want to change the base?

feat: OLE CF and VBA modules implemented #274

Conversation

davidmagnotti commented Dec 23, 2024

plusvic left a comment

Choose a reason for hiding this comment

plusvic Dec 23, 2024

Choose a reason for hiding this comment

plusvic Dec 23, 2024

Choose a reason for hiding this comment

plusvic Dec 23, 2024

Choose a reason for hiding this comment

plusvic Dec 24, 2024

Choose a reason for hiding this comment

plusvic Dec 24, 2024

Choose a reason for hiding this comment