Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tigervnc-1.14.0 crashes with SIGBUS on SPARC architecture #1859

Closed
SadClouds opened this issue Oct 28, 2024 · 12 comments
Closed

Tigervnc-1.14.0 crashes with SIGBUS on SPARC architecture #1859

SadClouds opened this issue Oct 28, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@SadClouds
Copy link

SadClouds commented Oct 28, 2024

This is a regression since 1.13.1 below is a backtrace

Program received signal SIGBUS, Bus error.
rfb::EncodeManager::checkSolidTile<unsigned char*> (this=0x407d51b8, pb=0x407dc008, colourValue=0xffffffffffffc73c "", r=...) at /opt/netbsd/pkg.objects/net/tigervnc/work.ultra10/tigervnc-1.14.0/common/rfb/EncodeManager.cxx:1081

(gdb) bt
#0  rfb::EncodeManager::checkSolidTile<unsigned char*> (this=0x407d51b8, pb=0x407dc008, colourValue=0xffffffffffffc73c "", r=...) at /opt/netbsd/pkg.objects/net/tigervnc/work.ultra10/tigervnc-1.14.0/common/rfb/EncodeManager.cxx:1081
#1  rfb::EncodeManager::findSolidRect (this=this@entry=0x407d51b8, rect=..., changed=changed@entry=0xffffffffffffc908, pb=pb@entry=0x407dc008) at /opt/netbsd/pkg.objects/net/tigervnc/work.ultra10/tigervnc-1.14.0/common/rfb/EncodeManager.cxx:686
#2  0x0000000000311330 in rfb::EncodeManager::writeSolidRects (this=this@entry=0x407d51b8, changed=changed@entry=0xffffffffffffc908, pb=pb@entry=0x407dc008) at /opt/netbsd/pkg.objects/net/tigervnc/work.ultra10/tigervnc-1.14.0/common/rfb/EncodeManager.cxx:658
#3  0x0000000000313664 in rfb::EncodeManager::doUpdate (this=this@entry=0x407d51b8, allowLossy=allowLossy@entry=true, changed_=..., copied=..., copyDelta=..., pb=pb@entry=0x407dc008, renderedCursor=0x40720418)
    at /opt/netbsd/pkg.objects/net/tigervnc/work.ultra10/tigervnc-1.14.0/common/rfb/EncodeManager.cxx:360
#4  0x000000000031376c in rfb::EncodeManager::writeUpdate (this=this@entry=0x407d51b8, ui=..., pb=0x407dc008, renderedCursor=renderedCursor@entry=0x40720418) at /opt/netbsd/pkg.objects/net/tigervnc/work.ultra10/tigervnc-1.14.0/common/rfb/EncodeManager.cxx:284
#5  0x000000000030c518 in rfb::VNCSConnectionST::writeDataUpdate (this=this@entry=0x407d4e00) at /opt/netbsd/pkg.objects/net/tigervnc/work.ultra10/tigervnc-1.14.0/common/rfb/VNCSConnectionST.cxx:1033
#6  0x000000000030c980 in rfb::VNCSConnectionST::writeFramebufferUpdate (this=this@entry=0x407d4e00) at /opt/netbsd/pkg.objects/net/tigervnc/work.ultra10/tigervnc-1.14.0/common/rfb/VNCSConnectionST.cxx:914
#7  0x000000000030ca58 in rfb::VNCSConnectionST::processMessages (this=0x407d4e00) at /opt/netbsd/pkg.objects/net/tigervnc/work.ultra10/tigervnc-1.14.0/common/rfb/VNCSConnectionST.cxx:192
#8  0x00000000002f24bc in XserverDesktop::handleSocketEvent (this=0x0, this@entry=0x407dc000, fd=fd@entry=7, sockserv=0x40720300, read=read@entry=true, write=write@entry=false) at XserverDesktop.cc:363
#9  0x00000000002f2550 in XserverDesktop::handleSocketEvent (this=0x407dc000, fd=<optimized out>, read=<optimized out>, write=<optimized out>) at XserverDesktop.cc:315
#10 0x00000000002d0538 in HandleNotifyFd (fd=<optimized out>, xevents=<optimized out>, data=0x40bfc520) at connection.c:818
#11 0x00000000002d52a8 in ospoll_wait (ospoll=0x405854c0, timeout=<optimized out>) at ospoll.c:681
#12 0x00000000002cd858 in WaitForSomething (are_ready=<optimized out>) at WaitFor.c:208
#13 0x000000000027c1fc in Dispatch () at dispatch.c:421
#14 0x0000000000280ea0 in dix_main (argc=<optimized out>, argv=0xffffffffffffd2f8, envp=<optimized out>) at main.c:276
#15 0x000000000017d7fc in ___start ()
#16 0x000000004060178c in _rtld_start () from /usr/libexec/ld.elf_so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
@CendioOssman
Copy link
Member

Thank you for your report.

Do you think you would be able to do a git bisect to narrow down the commit that caused this issue?

@SadClouds
Copy link
Author

SadClouds commented Oct 28, 2024

The issue appears to be on this line (whichever git commit that is)
tigervnc-1.14.0/common/rfb/EncodeManager.cxx:1081

SIGBUS is a misaligned load/store. I'm not a C++ programmer, but normally casting a pointer to a different type and dereferencing it, causes these types of issues.

@CendioOssman
Copy link
Member

That code is unfortunately very complex, so the crashing line is likely not where the issue originates. And we don't have any Sparc machines here, so we have a hard time tracking this down without someone helping to pinpoint where things went wrong.

@SadClouds
Copy link
Author

SadClouds commented Oct 28, 2024

Which is why I'm trying to help. The SPARC machine I have is very slow, it takes about half a day to build one VNC package, so bisecting many different versions and building them could take days and I don't have the capacity for this. The backtrace points exactly where the alignment issue occurred. I don't use C++ much, hence after looking at it quickly, it is not obvious to me why data access in this template is misaligned. It may be a better approach for developers to review the code around this line and suggest possible fixes. I'm happy to test them and report.

The problem originates on line 1075
buffer = (const T*)pb->getBuffer(r, &stride);

where getBuffer() returns a pointer, which is then cast to pointer to object T. The pointer returned is not aligned properly for object T. This looks like bad C++ programming.

@SadClouds
Copy link
Author

SadClouds commented Oct 30, 2024

I added some debug statements

template<class T>
inline bool EncodeManager::checkSolidTile(const Rect& r,
                                          const T colourValue,
                                          const PixelBuffer *pb)
{
  int w, h;
  const T* buffer;
  int stride, pad;

  w = r.width();
  h = r.height();

  buffer = (const T*)pb->getBuffer(r, &stride);
  vlog.error("buffer=%p", buffer);
  pad = stride - w;

  while (h--) {
    int w_ = w;
    while (w_--) {
      vlog.error("buffer=%p, h=%d, w=%d", buffer, h, w_);
      if (*buffer != colourValue)
        return false;
      buffer++;
      vlog.error("buffer++=%p", buffer);
    }
    buffer += pad;
  }

  return true;
}
Wed Oct 30 09:12:18 2024
 VNCSConnST:  Server default pixel format depth 24 (32bpp) big-endian rgb888
 VNCSConnST:  Client pixel format depth 24 (32bpp) little-endian rgb888
 EncodeManager: buffer=0x46e00400
 EncodeManager: buffer=0x46e00400, h=15, w=15
 EncodeManager: buffer=0x46e00440
 EncodeManager: buffer=0x46e00440, h=15, w=15
 EncodeManager: buffer=0x46e00480
...
 EncodeManager: buffer=0x47259080, h=15, w=15
 EncodeManager: buffer=0x472590c0
 EncodeManager: buffer=0x472590c0, h=15, w=8
 EncodeManager: buffer=0x47259124
 EncodeManager: buffer=0x47259124, h=15, w=15
(EE) Bus error at address 0x47259124
(EE) 
Fatal server error:
(EE) Caught signal 10 (Bus error). Server aborting
(EE) 
X connection to :1 broken (explicit kill or server shutdown).
Killing Xvnc process ID 7712
Xvnc process ID 7712 already killed

It seems the difference between successive buffer pointer addresses is 64 bytes, the last one which causes SIGBUS is 0x47259124 - 0x472590c0 = 100 bytes. Either way, looks like the buffer pointer may be pointing at a bogus address.

@CendioOssman
Copy link
Member

0x47259124 seems aligned well enough for 32-bit pixels. Any idea why it is still giving us SIGBUS?

@SadClouds
Copy link
Author

Because pointers on sparc64 must be aligned on 8 byte boundaries

gdb -p $(pgrep Xvnc)
...
Program received signal SIGBUS, Bus error.
...
(gdb) print buffer
$9 = (unsigned char * const *) 0x47259124
(gdb) print sizeof(buffer)
$10 = 8
(gdb) print 0x47259124 % 8
$11 = 4

So

if (*buffer != colourValue)

is trying to dereference a pointer which is only aligned on 4 byte boundary. Hence this code is broken in the functions which allocate and return the address for that buffer. That pointer must be aligned on 8 byte boundary, i.e. buffer % 8 must be 0.

@CendioOssman
Copy link
Member

That is something I don't think we can easily support. The graphics stack expects packed pixels. And we need to be able to access individual pixels.

And are you sure about that requirement? Every hit I get states that sparc64 has the normal alignment requirements of alignment matching the type size.

@SadClouds
Copy link
Author

SadClouds commented Nov 1, 2024

I got my previous gdb statement a bit wrong. The buffer variable seems to be a pointer to pointer, this is where I think the problem is

(gdb) p colourValue
$14 = (unsigned char * const) 0xffffffffffffc72c ""
(gdb) p buffer
$15 = (unsigned char * const *) 0x47259124
(gdb) p *buffer
$16 = (unsigned char * const) 0x45444f0045444f <error: Cannot access memory at address 0x45444f0045444f>
(gdb) p 0x45444f0045444f % 8
$17 = 7

So *buffer results in a data type unsigned char * which is a pointer and must be aligned on 8 bytes. The line *buffer != colourValue seems to compare two pointers, but *buffer results misaligned pointer load, causing SIGBUS. On sparc64 pointers are 8 bytes hence the alignment issue.

@SadClouds
Copy link
Author

You have the same issue on x86, but instead of SIGBUS, CPU will instead use misaligned load, which could be less efficient, i.e. instead of one load, it may issue two loads.

@CendioOssman
Copy link
Member

Good catch. This is not really an alignment issue, but rather a general bug. That is not supposed to be a pointer to a pointer. The templating is getting messed up for some reason, and the wrong method is being called.

@CendioOssman CendioOssman added the bug Something isn't working label Dec 6, 2024
@CendioOssman
Copy link
Member

Should be resolved in 9da4f05.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants