Make vector orthogonalization more reliable #930

christoph-conrads · 2023-11-09T16:34:00Z

Description

The list below only mentions the real implementations of subroutines but both real and complex implementations are meant.

This PR improves xORBDB5 and xORBDB6:

Do not assume unit-norm vector input in xORBDB6.
Use a safe value for the parameter ALPHA in xORBDB6.
Check for numerically zero input vectors in xORBDB5. Without this check, the subroutine may compute output vectors that are very small in norm which can cause problems when computing Householder reflectors at the call sites.
Introduce the REALZERO parameter in xORBDB5.

fixes #917

Checklist

The documentation has been updated.
If the PR solves a specific issue, it is set to be closed on merge.

Do not assume that the vector x passed to these subroutines has Euclidean norm one (the norm is almost surely smaller than one when xORBDB5/xUNBDB5 calls this subroutine for the first time). This commit fixes the occasional inaccurate results computed by xORCSD2BY1/xUNCSD2BY1.

With a modest amount of random testing on a work station, it is possible to detect the loss of orthogonality caused by the old early termination criterion.

* introduce a parameter representing a real zero for consistency * stop comparing real values with complex zero

Do not call xORBDB6/xUNBDB6 with input vectors x that are numerically zero because this can cause problems at the call sites (xLARFGP computations might underflow) leading to, e.g., the computation of nonunitary matrices in the xORCSD2BY1/xUNCSD2BY1 output.

codecov · 2023-11-09T16:41:01Z

Codecov Report

Attention: 104 lines in your changes are missing coverage. Please review.

Comparison is base (90398ae) 0.00% compared to head (7d15cc6) 0.00%.
Report is 7 commits behind head on master.

❗ Current head 7d15cc6 differs from pull request most recent head 64897c4. Consider uploading reports for the commit 64897c4 to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #930   +/-   ##
=======================================
  Coverage    0.00%    0.00%           
=======================================
  Files        1918     1918           
  Lines      188653   188709   +56     
=======================================
- Misses     188653   188709   +56

Files	Coverage Δ
SRC/clarfgp.f	`0.00% <0.00%> (ø)`
SRC/dlarfgp.f	`0.00% <0.00%> (ø)`
SRC/slarfgp.f	`0.00% <0.00%> (ø)`
SRC/zlarfgp.f	`0.00% <0.00%> (ø)`
SRC/cunbdb6.f	`0.00% <0.00%> (ø)`
SRC/dorbdb6.f	`0.00% <0.00%> (ø)`
SRC/sorbdb6.f	`0.00% <0.00%> (ø)`
SRC/zunbdb6.f	`0.00% <0.00%> (ø)`
SRC/cbdsqr.f	`0.00% <0.00%> (ø)`
SRC/zbdsqr.f	`0.00% <0.00%> (ø)`
... and 4 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

christoph-conrads · 2023-11-09T21:29:31Z

The PR was converted to a draft since there are still infinities in the output with the SORBDB2 input matrix in the code below. The code demonstrates that the matrix has orthonormal columns. The input was generated from a pair of 2x10 and 3x10 input matrices to SGGQRCS.

#!/usr/bin/python3

import ctypes
import ctypes.util
from ctypes import byref, c_char, c_float, c_int32, c_void_p, POINTER
import sys

import numpy as np
import numpy.linalg as la


def main():
    x = np.array([
        [-0.482012868, 0.869745135, 0.105863400],
        [-0.00000000, 0.00000000, 0.00000000],
        [-0.531500220, -0.194201767, -0.824495614],
        [2.10694591E-08, -3.80177596E-08, -4.62756322E-09],
        [0.696542203, 0.453684032, -0.555877328],
    ],
                 dtype=np.float32)
    r = np.dot(x.T, x) - np.eye(3, dtype=np.float32)

    eps = np.finfo(np.float32).eps
    print(la.norm(r) / la.norm(x) / eps)


if __name__ == "__main__":
    sys.exit(main())

christoph-conrads · 2023-11-10T14:38:17Z

The code at the end of this post calls SORCSD2BY1 with a special input triggering infinities in the computed U2 matrix. Example execution on my machine with GCC 12 (the reference to the directory in /tmp avoids uses of the system's LAPACK library):

$ gcc -Wextra -Wall -std=c99 -pedantic sorcsd2by1.c -L/tmp/tmp.mZJWTkULEs/lib -llapack
$ env LD_LIBRARY_PATH=/tmp/tmp.mZJWTkULEs/lib ./a.out 
+0.00e+00 +1.00e+00 +0.00e+00
+1.00e+00 -0.00e+00      -inf
     -inf -0.00e+00      -inf

The infinities come into existence as follows:

SORBDB5 is being passed a vector x that is almost numerically zero. The norm of x is 2.38 ⋅ 10⁻⁷ which is slightly larger than the value 1.19 ⋅ 10⁻⁷ below which the vector norm is considered to be zero.
SORBDB6 orthogonalizes x against a given set of orthonormal vectors and computes an output vector with norm 1.03 ⋅ 10⁻¹³.
Next, SLARFGP is called with the input vector [1.0 ⋅ 10⁻¹³; −7.5 ⋅ 10⁻²⁷]. Within this subroutine, ALPHA is re-computed in line 195 such that ALPHA ≈ 10⁻⁴⁰. Finally in line 223 the overflow is triggered by computing 1 / ALPHA (the largest single-precision finite value is on the order of 10³⁸).

The problem can be fixed in two ways:

Scaling the SORBDB6 input to unit norm with SSCAL( 1 / NORM ) (SSCAL documentation) makes the problem go away. The use of SLASCL would be preferred because it avoids the computation of the reciprocal of norm but this is not possible because INCX1 or INCX2 might be different from one.
Modify SLARFGP to handle the case, where ALPHA is positive but much smaller than 1 and XNORM is positive but much smaller than ALPHA.

This PR will scale the SORBDB6 input. This way the norm of the computed output vector is at least 0.83 ε and this will fix the issue based on the following reasoning. SLARFGP performs operations on an input vector x, where α ≔ x(1) and XNORM ≔ ∥x(1:)∥₂. To trigger the overflow it must hold that

α > 0
XNORM > 0
α ≫ XNORM

With these assumptions, SLARFGP performs the following computations (unfortunately, identifiers are re-used; signs will be ignored):

ALPHA ≔ X(1)
XNORM ≔ ∥X(1:)∥₂
BETA ≔ ∥X∥₂
ALPHA' ≔ ALPHA + BETA
ALPHA''  ≔ XNORM * (XNORM / ALPHA')

Then 1 / ALPHA'' is at most 1 / (0.83 ε³) and this value is smaller than the largest finite floating-point value both in single- and double-precision.

#include <math.h>
#include <stddef.h>
#include <stdio.h>

typedef int lapack_int;

void sorcsd2by1_(
    char* jobu1,
    char* jobu2,
    char* jobv1t,
    lapack_int* m,
    lapack_int* p,
    lapack_int* q,
    float* x11,
    lapack_int* ldx11,
    float* x21,
    lapack_int* ldx21,
    float* theta,
    float* u1,
    lapack_int* ldu1,
    float* u2,
    lapack_int* ldu2,
    float* v1t,
    lapack_int* ldv1t,
    float* work,
    lapack_int* lwork,
    lapack_int* iwork,
    lapack_int* info,
    size_t strlen_jobu1,
    size_t strlen_jobu2,
    size_t strlen_jobv1t);

#define M 5
#define P 2
#define Q 3

int main()
{
    lapack_int m = M;
    lapack_int p = P;
    lapack_int q = Q;
    float x[M * Q] = {
        -0.482012868,
        -0.00000000,
        -0.531500220,
        2.10694591E-08,
        0.696542203,
        0.869745135,
        0.00000000,
        -0.194201767,
        -3.80177596E-08,
        0.453684032,
        0.105863400,
        0.00000000,
        -0.824495614,
        -4.62756322E-09,
        -0.555877328
    };
    lapack_int ldx11 = m;
    lapack_int ldx21 = m;
    float nan = NAN;
    float theta[Q] = { nan };
    float u1[P * P] = { nan };
    lapack_int ldu1 = P;
    float u2[(M - P) * (M - P)] = { nan };
    lapack_int ldu2 = M - P;
    float v1t[Q * Q] = { nan };
    lapack_int ldv1t = Q;
    float work[256] = { nan };
    lapack_int lwork = sizeof(work) / sizeof(work[0]);
    lapack_int iwork[M + P + Q] = { -1 };
    lapack_int info = -1;

    char yes = 'Y';
    sorcsd2by1_(
        &yes,
        &yes,
        &yes,
        &m,
        &p,
        &q,
        x,
        &ldx11,
        x + p,
        &ldx21,
        theta,
        u1,
        &ldu1,
        u2,
        &ldu2,
        v1t,
        &ldv1t,
        work,
        &lwork,
        iwork,
        &info,
        1,
        1,
        1);

    for (size_t i = 0; i < M - P; ++i) {
        for (size_t j = 0; j < Q; ++j) {
            const char* separator = (j + 1 < Q) ? " " : "\n";
            printf("%+9.2e%s", u2[j * ldu2 + i], separator);
        }
    }
}

Call sites may run into problems when this subroutine computes nonzero vectors that are very small in norm.

christoph-conrads · 2023-11-10T16:31:21Z

Modify SLARFGP to handle the case, where ALPHA is positive but much smaller than 1 and XNORM is positive but much smaller than ALPHA.

This is tracked in #934.

christoph-conrads · 2023-11-10T18:53:42Z

There is no alternative to fixing SLARFGP. This input creates NaNs in the V1 matrix computed by SORCSD2BY1. Demonstration with the LAPACK build in /tmp/tmp.mZJWTkULEs:

$ gcc -Wextra -Wall -std=c99 -pedantic sorcsd2by1.c -L/tmp/tmp.mZJWTkULEs/lib/ -llapack
$ env LD_LIBRARY_PATH=/tmp/tmp.mZJWTkULEs/lib/ ./a.out 
     +nan      +nan      +nan
     +nan      +nan      +nan
     +nan      +nan      +nan

#include <math.h>
#include <stddef.h>
#include <stdio.h>

typedef int lapack_int;

void sorcsd2by1_(
    char* jobu1,
    char* jobu2,
    char* jobv1t,
    lapack_int* m,
    lapack_int* p,
    lapack_int* q,
    float* x11,
    lapack_int* ldx11,
    float* x21,
    lapack_int* ldx21,
    float* theta,
    float* u1,
    lapack_int* ldu1,
    float* u2,
    lapack_int* ldu2,
    float* v1t,
    lapack_int* ldv1t,
    float* work,
    lapack_int* lwork,
    lapack_int* iwork,
    lapack_int* info,
    size_t strlen_jobu1,
    size_t strlen_jobu2,
    size_t strlen_jobv1t);

#define M 5
#define P 2
#define Q 3

int main()
{
    lapack_int m = M;
    lapack_int p = P;
    lapack_int q = Q;
    float x[M * Q] = {
        -2.38418579E-07,
        0.00000000,
        0.00000000,
        1.19904053E-14,
        -1.00000000,
        1.00000000,
        0.00000000,
        -2.98023188E-08,
        -4.37113954E-08,
        -2.38418579E-07,
        -2.98023188E-08,
        3.55271368E-15,
        -1.00000000,
        1.36304023E-15,
        7.10542736E-15,
    };
    lapack_int ldx11 = m;
    lapack_int ldx21 = m;
    float nan = NAN;
    float theta[Q] = { nan };
    float u1[P * P] = { nan };
    lapack_int ldu1 = P;
    float u2[(M - P) * (M - P)] = { nan };
    lapack_int ldu2 = M - P;
    float v1t[Q * Q] = { nan };
    lapack_int ldv1t = Q;
    float work[256] = { nan };
    lapack_int lwork = sizeof(work) / sizeof(work[0]);
    lapack_int iwork[M + P + Q] = { -1 };
    lapack_int info = -1;

    char yes = 'Y';
    sorcsd2by1_(
        &yes,
        &yes,
        &yes,
        &m,
        &p,
        &q,
        x,
        &ldx11,
        x + p,
        &ldx21,
        theta,
        u1,
        &ldu1,
        u2,
        &ldu2,
        v1t,
        &ldv1t,
        work,
        &lwork,
        iwork,
        &info,
        1,
        1,
        1);

    for (size_t i = 0; i < Q; ++i) {
        for (size_t j = 0; j < Q; ++j) {
            const char* separator = (j + 1 < Q) ? " " : "\n";
            printf("%+9.2e%s", v1t[j * ldv1t + i], separator);
        }
    }
}

Avoid overflows when XNORM << ABS(ALPHA) << 1. fixes Reference-LAPACK#934

langou · 2023-11-11T04:23:15Z

There is no alternative to fixing SLARFGP. This input creates NaNs in the V1 matrix computed by SORCSD2BY1.

Thanks @christoph-conrads. Do you have a recommendation on what to do? Your conclusion "There is no alternative to fixing SLARFGP." is close to where we ended up too a decade ago or so. I did not do the analysis myself, by I remember that the conclusion that no one knew how to fix SLARFGP. How critical is the "P" in SLARFGP? If it is the "P" much critical, can we do some steps of Gram-Schmidt instead? In general, I am not a big fan of having more Gram-Schmidt in LAPACK, but Gram-Schmidt is one way to have a positive diagonal. All in all, do you have a recommendation on what to do?

christoph-conrads · 2023-11-11T11:12:05Z

Do you have a recommendation on what to do?

The latest commit in this PR changes the criterion when X is considered zero:

-     IF( XNORM.EQ.ZERO ) THEN
+     IF( XNORM.LE.EPS*ABS(ALPHA) ) THEN

From the point of view of a programmer, there was simply no branch in the code dealing with an input XNORM ≪ ABS(ALPHA) ≪ 1. From the point of view of backward error analysis, an error is introduced in the input vector v that is less than ε‖v‖, where v = [ALPHA, X(1), X(2), ..., X(N-1)], and from this fact it should follow that the Householder matrix computation is still backward stable.

christoph-conrads · 2023-11-11T11:33:04Z

Another NaN but this one was harder to find (90 minutes instead of 90 seconds).

Execution:

$ gcc -Wextra -Wall -std=c99 -pedantic sorcsd2by1.c -L/tmp/tmp.E5Z0UNs1ZK/lib -llapack
$ env LD_LIBRARY_PATH=/tmp/tmp.E5Z0UNs1ZK/lib ./a.out 
-1.00e+00 +1.42e-14 -1.19e-07 +0.00e+00
+0.00e+00 +1.42e-14 +0.00e+00 -1.00e+00
-1.42e-14 -1.00e+00 -0.00e+00 -1.42e-14
     -nan      -nan      -nan      -nan

Code:

#include <math.h>
#include <stddef.h>
#include <stdio.h>

typedef int lapack_int;

void sorcsd2by1_(
    char* jobu1,
    char* jobu2,
    char* jobv1t,
    lapack_int* m,
    lapack_int* p,
    lapack_int* q,
    float* x11,
    lapack_int* ldx11,
    float* x21,
    lapack_int* ldx21,
    float* theta,
    float* u1,
    lapack_int* ldu1,
    float* u2,
    lapack_int* ldu2,
    float* v1t,
    lapack_int* ldv1t,
    float* work,
    lapack_int* lwork,
    lapack_int* iwork,
    lapack_int* info,
    size_t strlen_jobu1,
    size_t strlen_jobu2,
    size_t strlen_jobv1t);

#define M 6
#define P 3
#define Q 4

int main()
{
    lapack_int m = M;
    lapack_int p = P;
    lapack_int q = Q;
    float x[M * Q] = {
        -1.00000000,
        -0.00000000,
        -0.00000000,
        8.68563248E-08,
        1.83140187E-07,
        5.64973490E-08,
        -1.83140173E-07,
        0.00000000,
        -0.00000000,
        1.79512426E-07,
        -1.00000000,
        1.11148256E-14,
        -8.68563603E-08,
        -1.62776259E-29,
        0.00000000,
        -1.00000000,
        -1.79512398E-07,
        -2.86137259E-16,
        -5.64973490E-08,
        1.02350803E-16,
        2.73971536E-15,
        5.19329114E-15,
        -7.67890847E-16,
        -1.00000000,
    };
    lapack_int ldx11 = m;
    lapack_int ldx21 = m;
    float nan = NAN;
    float theta[Q] = { nan };
    float u1[P * P] = { nan };
    lapack_int ldu1 = P;
    float u2[(M - P) * (M - P)] = { nan };
    lapack_int ldu2 = M - P;
    float v1t[Q * Q] = { nan };
    lapack_int ldv1t = Q;
    float work[256] = { nan };
    lapack_int lwork = sizeof(work) / sizeof(work[0]);
    lapack_int iwork[M + P + Q] = { -1 };
    lapack_int info = -1;

    char yes = 'Y';
    sorcsd2by1_(
        &yes,
        &yes,
        &yes,
        &m,
        &p,
        &q,
        x,
        &ldx11,
        x + p,
        &ldx21,
        theta,
        u1,
        &ldu1,
        u2,
        &ldu2,
        v1t,
        &ldv1t,
        work,
        &lwork,
        iwork,
        &info,
        1,
        1,
        1);

    for (size_t i = 0; i < Q; ++i) {
        for (size_t j = 0; j < Q; ++j) {
            const char* separator = (j + 1 < Q) ? " " : "\n";
            printf("%+9.2e%s", v1t[j * ldv1t + i], separator);
        }
    }
}

christoph-conrads · 2023-11-11T11:36:41Z

Your conclusion "There is no alternative to fixing SLARFGP." is close to where we ended up too a decade ago or so. I did not do the analysis myself, by I remember that the conclusion that no one knew how to fix SLARFGP. How critical is the "P" in SLARFGP?

Frankly I do not know which issue you are referring to. My comment was made after suggesting two possible fixes for the issues that I encountered here.

langou · 2023-11-11T16:45:23Z

Hi @christoph-conrads,

Frankly I do not know which issue you are referring to. My comment was made after suggesting two possible fixes for the issues that I encountered #930 (comment).

Fair enough. I had a vague recollection and I do not know myself. I think I am adding confusion as opposed to help out, let us move on and discard my comments.

Thanks for suggesting some fixes for these issues.

christoph-conrads added 4 commits November 9, 2023 17:14

xORBDB6/xUNBDB6: use safe termination criterion

ad3d9b8

With a modest amount of random testing on a work station, it is possible to detect the loss of orthogonality caused by the old early termination criterion.

xORBDB5/xUNBDB5: introduce real zero parameter

80d7c69

* introduce a parameter representing a real zero for consistency * stop comparing real values with complex zero

langou previously approved these changes Nov 9, 2023

View reviewed changes

christoph-conrads marked this pull request as draft November 9, 2023 21:26

xORBDB5/xUNBDB5: ensure xORBDB6 input has unit norm

5b0bc5e

Call sites may run into problems when this subroutine computes nonzero vectors that are very small in norm.

christoph-conrads dismissed langou’s stale review via 5b0bc5e November 10, 2023 15:32

christoph-conrads marked this pull request as ready for review November 10, 2023 15:32

christoph-conrads marked this pull request as draft November 10, 2023 18:48

xLARFGP: avoid overflows

64897c4

Avoid overflows when XNORM << ABS(ALPHA) << 1. fixes Reference-LAPACK#934

christoph-conrads force-pushed the 917-sorcsd2by1-computes-inaccurate-result branch from 073b96c to 64897c4 Compare November 10, 2023 19:09

christoph-conrads marked this pull request as ready for review November 10, 2023 19:09

langou approved these changes Nov 11, 2023

View reviewed changes

langou merged commit 6261d62 into Reference-LAPACK:master Nov 11, 2023
11 checks passed

christoph-conrads mentioned this pull request Nov 13, 2023

xLARFGP unnecessarily overflows #938

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make vector orthogonalization more reliable #930

Make vector orthogonalization more reliable #930

christoph-conrads commented Nov 9, 2023 •

edited

Loading

codecov bot commented Nov 9, 2023 •

edited

Loading

christoph-conrads commented Nov 9, 2023

christoph-conrads commented Nov 10, 2023 •

edited

Loading

christoph-conrads commented Nov 10, 2023

christoph-conrads commented Nov 10, 2023

langou commented Nov 11, 2023 •

edited

Loading

christoph-conrads commented Nov 11, 2023 •

edited

Loading

christoph-conrads commented Nov 11, 2023 •

edited

Loading

christoph-conrads commented Nov 11, 2023

langou commented Nov 11, 2023

Make vector orthogonalization more reliable #930

Make vector orthogonalization more reliable #930

Conversation

christoph-conrads commented Nov 9, 2023 • edited Loading

codecov bot commented Nov 9, 2023 • edited Loading

Codecov Report

christoph-conrads commented Nov 9, 2023

christoph-conrads commented Nov 10, 2023 • edited Loading

christoph-conrads commented Nov 10, 2023

christoph-conrads commented Nov 10, 2023

langou commented Nov 11, 2023 • edited Loading

christoph-conrads commented Nov 11, 2023 • edited Loading

christoph-conrads commented Nov 11, 2023 • edited Loading

christoph-conrads commented Nov 11, 2023

langou commented Nov 11, 2023

christoph-conrads commented Nov 9, 2023 •

edited

Loading

codecov bot commented Nov 9, 2023 •

edited

Loading

christoph-conrads commented Nov 10, 2023 •

edited

Loading

langou commented Nov 11, 2023 •

edited

Loading

christoph-conrads commented Nov 11, 2023 •

edited

Loading

christoph-conrads commented Nov 11, 2023 •

edited

Loading