---+ Development: Stack shifter
The =shift= branch of the =|pl-devel|= GIT repository contains the new stack shifter. It was developed to reduce address-space and memory requirements, notably for multi-threaded applications.
---++ Background
Since long, SWI-Prolog stack management was based on |virtual memory| management: SWI-Prolog claims address-space upto the limit for each stack and assigns physical memory as required. In the old days, with one thread and an address-space that is much larger than the affortable physical memory, this was fine. In todays 32-bit systems however, address-space is more valuable than physical memory. On 64-bit systems address-space is generally larger than physical memory. Still, using many threads, each of which potentially uses a lot of stacks (but most do not) easily exhausts address space well before physical memory becomes an issue.
SWI-Prolog 5.9.x (re-)introduces relocatable stacks (aka
|stack-shifting|): stacks are simply allocated and re-allocated using
the C-library malloc() and realloc() calls. There is a price. Using the
sparse virtual memory management, garbage collection was requested' long before running out of memory and executed at the first possible
safe' location (typically the call-port). The amount of stack needed
to get from one safe point to the next is hard to predict: consider
arithmetic producing very large integers or unification of large terms
that require long wakeup-lists (coroutining) and/or many `trail' events.
With limited stacks, the previous schema fails. The new implementation
deals with this in two ways:
-
Allow GC/shift in many more places. Just about anywhere in the virtual machine and any C-function that accesses Prolog through the formal interface can invoke a GC/shift.
-
The few places where this is not feasible, a low-level routine is called that may return `out-of-stack'. If so, the system backtracks, calls GC/shift and retries.
---++ Benefits
When stable, the new schema will enhance portability (just relying on GC/shift rather than much harder to port virtual address-space management), reduce both physical memory and address-space requirements and -in particular-, provide much better support for applications that wish to use many threads.
---++ Help debugging
One of the problems is that a lot of the code used to manage direct pointers and was not designed to deal with GC/shift that changes these pointers. Most of this is already fixed, but extensive testing is hard. The places where these problems trigger depend on 32/64 bit systems and the exact sequence of events.
To make quick progress, I'm particularly interested in problems on 64-bit platforms and more specifically in GCC-based systems (i.e. 64-bit Linux :-) If you are interested, please edit pl/src/Makefile and set
==
COFLAGS=-gdwarf-2 -g3 -DSECURE_GC -fno-strict-aliasing
==
Run SWI-Prolog under GDB, using this =|.gdbinit|=
==
set breakpoint pending on break trap_gdb break sysError break fatalError set breakpoint pending off
handle SIGPIPE noprint nostop pass handle SIGUSR1 noprint nostop pass handle SIGUSR2 noprint nostop pass set print thread-events off
If the system crashes, trapped in an =abort= call, =trap_gdb= or a fatal signal, examine the stack. You are looking for calls from PL_next_solution() or one of the foreign-language implemented predicates (often called pl_*) that lead to the crash. In particular, functions processing direct pointers (type =Word=) that call GC/shift, often indirectly.
Sometimes, the crash is caused by an earlier GC/shift that corrupted the data. You can get info in this using (1 means the latest GC/shift; 2 the one before, upto 10).
These stack-traces are addresses only, but gdb can list the code the belongs to it using the command =|(gdb) list *
|=If this means anything to you, you are possible able to indicate where the problem is. If not, sending the program and instructions on how to reproduce helps a lot.