Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid pretty output ptr* #262

Open
Ptival opened this issue Nov 1, 2023 · 3 comments
Open

Invalid pretty output ptr* #262

Ptival opened this issue Nov 1, 2023 · 3 comments

Comments

@Ptival
Copy link
Contributor

Ptival commented Nov 1, 2023

Encountered some incorrect roundtripping. Here is a fairly minimal reproducer:

define i32 @f(ptr noundef %0) #0 {
  %2 = alloca ptr, align 8
  store ptr %0, ptr %2, align 8
  ret i32 0
}

Assembling to bitcode, parsing with llvm-pretty-bc-parser and then pretty-printing with ppModule yields:

source_filename = "./mini.ll"
target triple = "unknown-unknown-unknown-unknown-"
define default i32 @f(ptr %0) {
; <label>: 1
  %2 = alloca ptr, align 8
  store ptr %0, ptr* %2, align 8
  ret i32 0
}

which contains invalid LLVM IR ptr*.

Thanks to @kquick I now know I can avoid this problem by manually calling fixupOpaquePtrs, however, I wonder whether we should avoid producing PtrTo PtrOpaque in the first place, on the fly?

@RyanGlScott
Copy link
Contributor

Yes, this is rather unfortunate. Before I discuss a possible fix for this, let me first provide some context as to how things became this way (feel free to skip to the bottom of this comment if you already know the story).


The root of the issue is that, broadly speaking, there two situations where llvm-pretty-bc-parser must fill in Types in the AST:

  1. When parsing a type directly from bitcode. This is the easy case, and it's obvious how to do this.
  2. When parsing an instruction that encodes some additional type information that isn't apparent from the instruction's arguments. For instance, the bitcode for the store instruction has a type argument that represents the type of memory to store, but the store instruction doesn't directly encode the type of pointer being written to.

(2) is the tricky case, since it's up to the LLVM parsing library to determine what type should go in the AST. Continuing the store example, let's suppose that we have:

store i8 %0, ptr %1

Or, to use the pre-opaque pointer syntax:

store i8 %0, i8* %1

The bitcode for this store instruction will directly encode the i8 type, which is easy to parse (this is a type-(1) situation). It will not encode the ptr/i8* type, however (this is a type-(2) situation). llvm-pretty-bc-parser now has a choice to make: should it fill in ptr in the AST, or should it fill in i8*?

If we aimed to only support one particular version of LLVM, then this would be an easy choice, as we would simply pick ptr for recent LLVM versions and i8* for older LLVM versions. But llvm-pretty-bc-parser has made the somewhat unusual choice of supporting multiple versions of the LLVM language simultaneously. There are good reasons for doing so (there are some projects where we still need to use non-opaque pointers), so when I upgraded llvm-pretty-bc-parser to support opaque pointers, I decided to have the library pick i8* in this example. This isn't an entirely arbitrary choice, since i8* has more type information than ptr, and i8* is supported by more LLVM versions than ptr.

For tools like Crucible, the consequences of this choice aren't that important, since it can handle both i8* and ptr without issue, and it can even handle programs that contain both. For tools like llvm-as, however, mixing the two pointer types is problematic, as it requires that a given program either use exclusively opaque pointers or exclusively non-opaque pointers. llvm-pretty-bc-parser's own test suite uses llvm-as, so I cooked up the fixupOpaquePtrs function to sanitize files with mixed pointer types and make them all use opaque ones. This is ultimately a hack, however, and it is one that we should strive to remove.


At the end of the day, producing ASTs with mixed opaque and non-opaque pointer types is asking for trouble, so I think we should exclusively use one or the other. Unfortunately, different LLVM versions only support certain pointer types, so we must be careful about which pointer type we pick.

One possible solution would be to tweak llvm-pretty-bc-parser's behavior based on what version of LLVM it is parsing. I'm not terribly fond of this idea, since the LLVM bitcode format is intended to be (largely) backwards-compatible, and we really ought to be able to parse bitcode without having to know the version number.

Another possible way forward would be to add a configuration option that allows switching between one pointer type or the other. I'm slightly more fond of this idea, although we'd have to figure out what the right default setting should be.

@peterohanley
Copy link

I think I have another case of needing more info:

@global_var = external constant [1 x i8]

define i64 @h() {
  br i1 icmp ne (i32 ptrtoint (ptr @global_var to i32), i32 1), label %pc_1, label %pc_1
pc_1:
  ret i64 0
}

llvm-as -> .bc -> llvm-disasm:

; reduced2.ll.bc
source_filename = "reduced2.ll"
target triple = "unknown-unknown-unknown-unknown-"
@global_var = external constant [1 x i8]
define i64 @h() {
; <label>: 0
  br i1 icmp ne ([1 x i8]* @global_var, i32 1), label %pc_1, label %pc_1
pc_1:
  ret i64 0
}

This is rejected by llvm-as:

llvm-as: reduced2.ll:6:9: error: compare operands must have the same type
br i1 icmp ne ([1 x i8]* @global_var, i32 1), label %pc_1, label %pc_1
      ^

llvm-as -> .bc -> llvm-dis:

; ModuleID = 'reduced2.ll.bc'
source_filename = "reduced2.ll"

@global_var = external constant [1 x i8]

define i64 @h() {
  br i1 icmp ne (i32 ptrtoint (ptr @global_var to i32), i32 1), label %pc_1, label %pc_1

pc_1:                                             ; preds = %0, %0
  ret i64 0
}

There is no ptrtoint in the AST dump. I haven't analyzed the bitcode directly yet.

@RyanGlScott
Copy link
Contributor

I think that this is actually a separate issue, as it's possible to trigger this bug without the use of opaque pointers. See #266.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants