Skip to content

A Cross-Platform C++ parser library for Windows user minidumps with Python 3 bindings.

License

Notifications You must be signed in to change notification settings

0vercl0k/udmp-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

udmp-parser: A Cross-Platform C++ parser library for Windows user minidumps

Build status Downloads

This is a cross-platform (Windows / Linux / OSX / x86 / x64) C++ library that parses Windows user minidump dumps (.dump /m and not .dump /f in WinDbg usermode).

parser

The library supports Intel 32-bit / 64-bit dumps and provides read access to things like:

  • The thread list and their context records,
  • The virtual memory,
  • The loaded modules.

Compiled binaries are available in the releases section.

Parser

The parser application is a small utility to show-case how to use the library and demonstrate its features. You can use it to dump memory, list the loaded modules, dump thread contexts, dump a memory map various, etc.

parser-usage

Here are the options supported:

parser.exe [-a] [-mods] [-mem] [-t [<TID>|main] [-h] [-dump <addr>] <dump path>

Examples:
  Show all:
    parser.exe -a user.dmp
  Show loaded modules:
    parser.exe -mods user.dmp
  Show memory map:
    parser.exe -mem user.dmp
  Show all threads:
    parser.exe -t user.dmp
  Show thread w/ specific TID:
    parser.exe -t 1337 user.dmp
  Show foreground thread:
    parser.exe -t main user.dmp
  Show a memory page at a specific address:
    parser.exe -dump 0x7ff00 user.dmp

Building

You can build it yourself using the appropriate build script for your platform in the build directory. It builds on Linux, Windows, OSX with the Microsoft, the LLVM Clang and GNU compilers.

Here is an example on Windows:

udmp-parser>cd src\build
udmp-parser\src\build>build-release.bat
udmp-parser\src\build>cmake .. -GNinja
-- The C compiler identification is MSVC 19.29.30139.0
-- The CXX compiler identification is MSVC 19.29.30139.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: C:/work/codes/udmp-parser/src/build
udmp-parser\src\build>cmake --build . --config RelWithDebInfo
[1/2] Building CXX object parser\CMakeFiles\parser.dir\parser.cc.obj
cl : Command line warning D9025 : overriding '/W3' with '/W4'
[2/2] Linking CXX executable parser\parser.exe

And here is another example on Linux:

~/udmp-parser$ cd src/build
~/udmp-parser/src/build$ chmod u+x build-release.sh
~/udmp-parser/src/build$ ./build-release.sh
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: ~/udmp-parser/src/build
[2/2] Linking CXX executable parser/parser

Python bindings

From PyPI

The easiest way is simply to:

pip install udmp_parser

Using PIP

To install the package

cd src/python
pip install .

To create a wheel pacakge

cd src/python
pip wheel .

Usage

The Python API was built around the C++ code so the names were preserved. Everything lives within the module udmp_parser. Note: For convenience, a simple pure Python script was added to generate minidumps ready to use:

$ python -i src/python/tests/utils.py
>>> pid, dmppath = generate_minidump_from_process_name("winver.exe")
Minidump generated successfully: PID=3232 -> minidump-winver.exe-1687024880.dmp
>>> pid
3232
>>> dmppath
WindowsPath('minidump-winver.exe-1687024880.dmp'))

Parsing a minidump object is as simple as:

>>> import udmp_parser
>>> udmp_parser.version.major, udmp_parser.version.minor, udmp_parser.version.release
(0, 4, '')
>>> dmp = udmp_parser.UserDumpParser()
>>> dmp.Parse(pathlib.Path("C:/temp/rundll32.dmp"))
True

Feature-wise, here are some examples of usage:

Threads

Get a hashmap of threads (as {TID: ThreadObject}), access their information:

>>> threads = dmp.Threads()
>>> len(threads)
14
>>> threads
{5292: Thread(Id=0x14ac, SuspendCount=0x1, Teb=0x2e8000),
 5300: Thread(Id=0x14b4, SuspendCount=0x1, Teb=0x2e5000),
 5316: Thread(Id=0x14c4, SuspendCount=0x1, Teb=0x2df000),
 3136: Thread(Id=0xc40, SuspendCount=0x1, Teb=0x2ee000),
 4204: Thread(Id=0x106c, SuspendCount=0x1, Teb=0x309000),
 5328: Thread(Id=0x14d0, SuspendCount=0x1, Teb=0x2e2000),
 1952: Thread(Id=0x7a0, SuspendCount=0x1, Teb=0x2f7000),
 3888: Thread(Id=0xf30, SuspendCount=0x1, Teb=0x2eb000),
 1760: Thread(Id=0x6e0, SuspendCount=0x1, Teb=0x2f4000),
 792: Thread(Id=0x318, SuspendCount=0x1, Teb=0x300000),
 1972: Thread(Id=0x7b4, SuspendCount=0x1, Teb=0x2fa000),
 1228: Thread(Id=0x4cc, SuspendCount=0x1, Teb=0x2fd000),
 516: Thread(Id=0x204, SuspendCount=0x1, Teb=0x303000),
 2416: Thread(Id=0x970, SuspendCount=0x1, Teb=0x306000)}

And access invidual thread, including their register context:

>>> thread = threads[5292]
>>> print(f"RIP={thread.Context.Rip:#x} RBP={thread.Context.Rbp:#x} RSP={thread.Context.Rsp:#x}")
RIP=0x7ffc264b0ad4 RBP=0x404fecc RSP=0x7de628

Modules

Get a hashmap of modules (as {address: ModuleObject}), access their information:

>>> modules = dmp.Modules()
>>> modules
{1572864: Module_t(BaseOfImage=0x180000, SizeOfImage=0x3000, ModuleName=C:\Windows\SysWOW64\sfc.dll),
 10813440: Module_t(BaseOfImage=0xa50000, SizeOfImage=0x14000, ModuleName=C:\Windows\SysWOW64\rundll32.exe),
 1929052160: Module_t(BaseOfImage=0x72fb0000, SizeOfImage=0x11000, ModuleName=C:\Windows\SysWOW64\wkscli.dll),
 1929183232: Module_t(BaseOfImage=0x72fd0000, SizeOfImage=0x52000, ModuleName=C:\Windows\SysWOW64\mswsock.dll),
 1929576448: Module_t(BaseOfImage=0x73030000, SizeOfImage=0xf000, ModuleName=C:\Windows\SysWOW64\browcli.dll),
 1929641984: Module_t(BaseOfImage=0x73040000, SizeOfImage=0xa000, ModuleName=C:\Windows\SysWOW64\davhlpr.dll),
 1929707520: Module_t(BaseOfImage=0x73050000, SizeOfImage=0x19000, ModuleName=C:\Windows\SysWOW64\davclnt.dll),
 1929838592: Module_t(BaseOfImage=0x73070000, SizeOfImage=0x18000, ModuleName=C:\Windows\SysWOW64\ntlanman.dll),
 [...]
 140720922427392: Module_t(BaseOfImage=0x7ffc24980000, SizeOfImage=0x83000, ModuleName=C:\Windows\System32\wow64win.dll),
 140720923017216: Module_t(BaseOfImage=0x7ffc24a10000, SizeOfImage=0x59000, ModuleName=C:\Windows\System32\wow64.dll),
 140720950280192: Module_t(BaseOfImage=0x7ffc26410000, SizeOfImage=0x1f8000, ModuleName=C:\Windows\System32\ntdll.dll)}

Access directly module info:

>>> ntdll_modules = [mod for addr, mod in dmp.Modules().items() if mod.ModuleName.lower().endswith("ntdll.dll")]
>>> len(ntdll_modules)
2
>>> for ntdll in ntdll_modules:
  print(f"{ntdll.ModuleName=} {ntdll.BaseOfImage=:#x} {ntdll.SizeOfImage=:#x}")

ntdll.ModuleName='C:\\Windows\\SysWOW64\\ntdll.dll' ntdll.BaseOfImage=0x77430000 ntdll.SizeOfImage=0x1a4000
ntdll.ModuleName='C:\\Windows\\System32\\ntdll.dll' ntdll.BaseOfImage=0x7ffc26410000 ntdll.SizeOfImage=0x1f8000

A convenience function under udmp_parser.UserDumpParser.ReadMemory() can be used to directly read memory from the dump. The signature of the function is as follow: def ReadMemory(Address: int, Size: int) -> list[int]. So to dump for instance the wow64 module, it would go as follow:

>>> wow64 = [mod for addr, mod in dmp.Modules().items() if mod.ModuleName.lower() == r"c:\windows\system32\wow64.dll"][0]
>>> print(str(wow64))
Module_t(BaseOfImage=0x7ffc24a10000, SizeOfImage=0x59000, ModuleName=C:\Windows\System32\wow64.dll)
>>> wow64_module = bytearray(dmp.ReadMemory(wow64.BaseOfImage, wow64.SizeOfImage))
>>> assert wow64_module[:2] == b'MZ'
>>> import hexdump
>>> hexdump.hexdump(wow64_module[:128])
00000000: 4D 5A 90 00 03 00 00 00  04 00 00 00 FF FF 00 00  MZ..............
00000010: B8 00 00 00 00 00 00 00  40 00 00 00 00 00 00 00  ........@.......
00000020: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
00000030: 00 00 00 00 00 00 00 00  00 00 00 00 E8 00 00 00  ................
00000040: 0E 1F BA 0E 00 B4 09 CD  21 B8 01 4C CD 21 54 68  ........!..L.!Th
00000050: 69 73 20 70 72 6F 67 72  61 6D 20 63 61 6E 6E 6F  is program canno
00000060: 74 20 62 65 20 72 75 6E  20 69 6E 20 44 4F 53 20  t be run in DOS
00000070: 6D 6F 64 65 2E 0D 0D 0A  24 00 00 00 00 00 00 00  mode....$.......

Memory

The memory blocks can also be enumerated in a hashmap {address: MemoryBlock}.

>>> memory = dmp.Memory()
>>> len(memory)
0x260
>>> memory
[...]
 0x7ffc26410000: [MemBlock_t(BaseAddress=0x7ffc26410000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x1000)],
 0x7ffc26411000: [MemBlock_t(BaseAddress=0x7ffc26411000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x11c000)],
 0x7ffc2652d000: [MemBlock_t(BaseAddress=0x7ffc2652d000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x49000)],
 0x7ffc26576000: [MemBlock_t(BaseAddress=0x7ffc26576000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x1000)],
 0x7ffc26577000: [MemBlock_t(BaseAddress=0x7ffc26577000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x2000)],
 0x7ffc26579000: [MemBlock_t(BaseAddress=0x7ffc26579000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x9000)],
 0x7ffc26582000: [MemBlock_t(BaseAddress=0x7ffc26582000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x86000)],
 0x7ffc26608000: [MemBlock_t(BaseAddress=0x7ffc26608000, AllocationBase=0x0, AllocationProtect=0x0, RegionSize=0x3d99e8000)]}

To facilitate the parsing in a human-friendly manner, some helper functions are provided:

  • udmp_parser.utils.TypeToString: convert the region type to its meaning (from MSDN)
  • udmp_parser.utils.StateToString: convert the region state to its meaning (from MSDN)
  • udmp_parser.utils.ProtectionToString: convert the region protection to its meaning (from MSDN)

This allows to search and filter in a more comprehensible way:

# Collect only executable memory regions
>>> exec_regions = [region for _, region in dmp.Memory().items() if "PAGE_EXECUTE_READ" in udmp_parser.utils.ProtectionToString(region.Protect)]

# Pick any, disassemble code using capstone
>>> exec_region = exec_regions[-1]
>>> mem = dmp.ReadMemory(exec_region.BaseAddress, 0x100)
>>> for insn in cs.disasm(bytearray(mem), exec_region.BaseAddress):
  print(f"{insn=}")

insn=<CsInsn 0x7ffc26582000 [cc]: int3 >
insn=<CsInsn 0x7ffc26582001 [cc]: int3 >
insn=<CsInsn 0x7ffc26582002 [cc]: int3 >
insn=<CsInsn 0x7ffc26582003 [cc]: int3 >
insn=<CsInsn 0x7ffc26582004 [cc]: int3 >
insn=<CsInsn 0x7ffc26582005 [cc]: int3 >
insn=<CsInsn 0x7ffc26582006 [cc]: int3 >
insn=<CsInsn 0x7ffc26582007 [cc]: int3 >
insn=<CsInsn 0x7ffc26582008 [cc]: int3 >
insn=<CsInsn 0x7ffc26582009 [cc]: int3 >
insn=<CsInsn 0x7ffc2658200a [cc]: int3 >
insn=<CsInsn 0x7ffc2658200b [cc]: int3 >
insn=<CsInsn 0x7ffc2658200c [cc]: int3 >
insn=<CsInsn 0x7ffc2658200d [cc]: int3 >
insn=<CsInsn 0x7ffc2658200e [cc]: int3 >
insn=<CsInsn 0x7ffc2658200f [cc]: int3 >
insn=<CsInsn 0x7ffc26582010 [48895c2410]: mov qword ptr [rsp + 0x10], rbx>
insn=<CsInsn 0x7ffc26582015 [4889742418]: mov qword ptr [rsp + 0x18], rsi>
insn=<CsInsn 0x7ffc2658201a [57]: push rdi>
insn=<CsInsn 0x7ffc2658201b [4156]: push r14>
insn=<CsInsn 0x7ffc2658201d [4157]: push r15>
[...]

Authors

Contributors

contributors-img