Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make vector orthogonalization more reliable #930

Conversation

christoph-conrads
Copy link
Contributor

@christoph-conrads christoph-conrads commented Nov 9, 2023

Description

The list below only mentions the real implementations of subroutines but both real and complex implementations are meant.

This PR improves xORBDB5 and xORBDB6:

  • Do not assume unit-norm vector input in xORBDB6.
  • Use a safe value for the parameter ALPHA in xORBDB6.
  • Check for numerically zero input vectors in xORBDB5. Without this check, the subroutine may compute output vectors that are very small in norm which can cause problems when computing Householder reflectors at the call sites.
  • Introduce the REALZERO parameter in xORBDB5.

fixes #917

Checklist

  • The documentation has been updated.
  • If the PR solves a specific issue, it is set to be closed on merge.

Do not assume that the vector x passed to these subroutines has
Euclidean norm one (the norm is almost surely smaller than one when
xORBDB5/xUNBDB5 calls this subroutine for the first time).

This commit fixes the occasional inaccurate results computed by
xORCSD2BY1/xUNCSD2BY1.
With a modest amount of random testing on a work station, it is possible
to detect the loss of orthogonality caused by the old early termination
criterion.
* introduce a parameter representing a real zero for consistency
* stop comparing real values with complex zero
Do not call xORBDB6/xUNBDB6 with input vectors x that are numerically
zero because this can cause problems at the call sites (xLARFGP
computations might underflow) leading to, e.g., the computation of
nonunitary matrices in the xORCSD2BY1/xUNCSD2BY1 output.
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Attention: 104 lines in your changes are missing coverage. Please review.

Comparison is base (90398ae) 0.00% compared to head (7d15cc6) 0.00%.
Report is 7 commits behind head on master.

❗ Current head 7d15cc6 differs from pull request most recent head 64897c4. Consider uploading reports for the commit 64897c4 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #930   +/-   ##
=======================================
  Coverage    0.00%    0.00%           
=======================================
  Files        1918     1918           
  Lines      188653   188709   +56     
=======================================
- Misses     188653   188709   +56     
Files Coverage Δ
SRC/clarfgp.f 0.00% <0.00%> (ø)
SRC/dlarfgp.f 0.00% <0.00%> (ø)
SRC/slarfgp.f 0.00% <0.00%> (ø)
SRC/zlarfgp.f 0.00% <0.00%> (ø)
SRC/cunbdb6.f 0.00% <0.00%> (ø)
SRC/dorbdb6.f 0.00% <0.00%> (ø)
SRC/sorbdb6.f 0.00% <0.00%> (ø)
SRC/zunbdb6.f 0.00% <0.00%> (ø)
SRC/cbdsqr.f 0.00% <0.00%> (ø)
SRC/zbdsqr.f 0.00% <0.00%> (ø)
... and 4 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

langou
langou previously approved these changes Nov 9, 2023
@christoph-conrads christoph-conrads marked this pull request as draft November 9, 2023 21:26
@christoph-conrads
Copy link
Contributor Author

The PR was converted to a draft since there are still infinities in the output with the SORBDB2 input matrix in the code below. The code demonstrates that the matrix has orthonormal columns. The input was generated from a pair of 2x10 and 3x10 input matrices to SGGQRCS.

#!/usr/bin/python3

import ctypes
import ctypes.util
from ctypes import byref, c_char, c_float, c_int32, c_void_p, POINTER
import sys

import numpy as np
import numpy.linalg as la


def main():
    x = np.array([
        [-0.482012868, 0.869745135, 0.105863400],
        [-0.00000000, 0.00000000, 0.00000000],
        [-0.531500220, -0.194201767, -0.824495614],
        [2.10694591E-08, -3.80177596E-08, -4.62756322E-09],
        [0.696542203, 0.453684032, -0.555877328],
    ],
                 dtype=np.float32)
    r = np.dot(x.T, x) - np.eye(3, dtype=np.float32)

    eps = np.finfo(np.float32).eps
    print(la.norm(r) / la.norm(x) / eps)


if __name__ == "__main__":
    sys.exit(main())

@christoph-conrads
Copy link
Contributor Author

christoph-conrads commented Nov 10, 2023

The code at the end of this post calls SORCSD2BY1 with a special input triggering infinities in the computed U2 matrix. Example execution on my machine with GCC 12 (the reference to the directory in /tmp avoids uses of the system's LAPACK library):

$ gcc -Wextra -Wall -std=c99 -pedantic sorcsd2by1.c -L/tmp/tmp.mZJWTkULEs/lib -llapack
$ env LD_LIBRARY_PATH=/tmp/tmp.mZJWTkULEs/lib ./a.out 
+0.00e+00 +1.00e+00 +0.00e+00
+1.00e+00 -0.00e+00      -inf
     -inf -0.00e+00      -inf

The infinities come into existence as follows:

  • SORBDB5 is being passed a vector x that is almost numerically zero. The norm of x is 2.38 ⋅ 10⁻⁷ which is slightly larger than the value 1.19 ⋅ 10⁻⁷ below which the vector norm is considered to be zero.
  • SORBDB6 orthogonalizes x against a given set of orthonormal vectors and computes an output vector with norm 1.03 ⋅ 10⁻¹³.
  • Next, SLARFGP is called with the input vector [1.0 ⋅ 10⁻¹³; −7.5 ⋅ 10⁻²⁷]. Within this subroutine, ALPHA is re-computed in line 195 such that ALPHA ≈ 10⁻⁴⁰. Finally in line 223 the overflow is triggered by computing 1 / ALPHA (the largest single-precision finite value is on the order of 10³⁸).

The problem can be fixed in two ways:

  • Scaling the SORBDB6 input to unit norm with SSCAL( 1 / NORM ) (SSCAL documentation) makes the problem go away. The use of SLASCL would be preferred because it avoids the computation of the reciprocal of norm but this is not possible because INCX1 or INCX2 might be different from one.
  • Modify SLARFGP to handle the case, where ALPHA is positive but much smaller than 1 and XNORM is positive but much smaller than ALPHA.

This PR will scale the SORBDB6 input. This way the norm of the computed output vector is at least 0.83 ε and this will fix the issue based on the following reasoning. SLARFGP performs operations on an input vector x, where α ≔ x(1) and XNORM ≔ ∥x(1:)∥₂. To trigger the overflow it must hold that

  • α > 0
  • XNORM > 0
  • α ≫ XNORM

With these assumptions, SLARFGP performs the following computations (unfortunately, identifiers are re-used; signs will be ignored):

ALPHA ≔ X(1)
XNORM ≔ ∥X(1:)∥₂
BETA ≔ ∥X∥₂
ALPHA' ≔ ALPHA + BETA
ALPHA''  ≔ XNORM * (XNORM / ALPHA')

Then 1 / ALPHA'' is at most 1 / (0.83 ε³) and this value is smaller than the largest finite floating-point value both in single- and double-precision.

#include <math.h>
#include <stddef.h>
#include <stdio.h>

typedef int lapack_int;

void sorcsd2by1_(
    char* jobu1,
    char* jobu2,
    char* jobv1t,
    lapack_int* m,
    lapack_int* p,
    lapack_int* q,
    float* x11,
    lapack_int* ldx11,
    float* x21,
    lapack_int* ldx21,
    float* theta,
    float* u1,
    lapack_int* ldu1,
    float* u2,
    lapack_int* ldu2,
    float* v1t,
    lapack_int* ldv1t,
    float* work,
    lapack_int* lwork,
    lapack_int* iwork,
    lapack_int* info,
    size_t strlen_jobu1,
    size_t strlen_jobu2,
    size_t strlen_jobv1t);

#define M 5
#define P 2
#define Q 3

int main()
{
    lapack_int m = M;
    lapack_int p = P;
    lapack_int q = Q;
    float x[M * Q] = {
        -0.482012868,
        -0.00000000,
        -0.531500220,
        2.10694591E-08,
        0.696542203,
        0.869745135,
        0.00000000,
        -0.194201767,
        -3.80177596E-08,
        0.453684032,
        0.105863400,
        0.00000000,
        -0.824495614,
        -4.62756322E-09,
        -0.555877328
    };
    lapack_int ldx11 = m;
    lapack_int ldx21 = m;
    float nan = NAN;
    float theta[Q] = { nan };
    float u1[P * P] = { nan };
    lapack_int ldu1 = P;
    float u2[(M - P) * (M - P)] = { nan };
    lapack_int ldu2 = M - P;
    float v1t[Q * Q] = { nan };
    lapack_int ldv1t = Q;
    float work[256] = { nan };
    lapack_int lwork = sizeof(work) / sizeof(work[0]);
    lapack_int iwork[M + P + Q] = { -1 };
    lapack_int info = -1;

    char yes = 'Y';
    sorcsd2by1_(
        &yes,
        &yes,
        &yes,
        &m,
        &p,
        &q,
        x,
        &ldx11,
        x + p,
        &ldx21,
        theta,
        u1,
        &ldu1,
        u2,
        &ldu2,
        v1t,
        &ldv1t,
        work,
        &lwork,
        iwork,
        &info,
        1,
        1,
        1);

    for (size_t i = 0; i < M - P; ++i) {
        for (size_t j = 0; j < Q; ++j) {
            const char* separator = (j + 1 < Q) ? " " : "\n";
            printf("%+9.2e%s", u2[j * ldu2 + i], separator);
        }
    }
}

Call sites may run into problems when this subroutine computes nonzero
vectors that are very small in norm.
@christoph-conrads christoph-conrads marked this pull request as ready for review November 10, 2023 15:32
@christoph-conrads
Copy link
Contributor Author

  • Modify SLARFGP to handle the case, where ALPHA is positive but much smaller than 1 and XNORM is positive but much smaller than ALPHA.

This is tracked in #934.

@christoph-conrads christoph-conrads marked this pull request as draft November 10, 2023 18:48
@christoph-conrads
Copy link
Contributor Author

There is no alternative to fixing SLARFGP. This input creates NaNs in the V1 matrix computed by SORCSD2BY1. Demonstration with the LAPACK build in /tmp/tmp.mZJWTkULEs:

$ gcc -Wextra -Wall -std=c99 -pedantic sorcsd2by1.c -L/tmp/tmp.mZJWTkULEs/lib/ -llapack
$ env LD_LIBRARY_PATH=/tmp/tmp.mZJWTkULEs/lib/ ./a.out 
     +nan      +nan      +nan
     +nan      +nan      +nan
     +nan      +nan      +nan
#include <math.h>
#include <stddef.h>
#include <stdio.h>

typedef int lapack_int;

void sorcsd2by1_(
    char* jobu1,
    char* jobu2,
    char* jobv1t,
    lapack_int* m,
    lapack_int* p,
    lapack_int* q,
    float* x11,
    lapack_int* ldx11,
    float* x21,
    lapack_int* ldx21,
    float* theta,
    float* u1,
    lapack_int* ldu1,
    float* u2,
    lapack_int* ldu2,
    float* v1t,
    lapack_int* ldv1t,
    float* work,
    lapack_int* lwork,
    lapack_int* iwork,
    lapack_int* info,
    size_t strlen_jobu1,
    size_t strlen_jobu2,
    size_t strlen_jobv1t);

#define M 5
#define P 2
#define Q 3

int main()
{
    lapack_int m = M;
    lapack_int p = P;
    lapack_int q = Q;
    float x[M * Q] = {
        -2.38418579E-07,
        0.00000000,
        0.00000000,
        1.19904053E-14,
        -1.00000000,
        1.00000000,
        0.00000000,
        -2.98023188E-08,
        -4.37113954E-08,
        -2.38418579E-07,
        -2.98023188E-08,
        3.55271368E-15,
        -1.00000000,
        1.36304023E-15,
        7.10542736E-15,
    };
    lapack_int ldx11 = m;
    lapack_int ldx21 = m;
    float nan = NAN;
    float theta[Q] = { nan };
    float u1[P * P] = { nan };
    lapack_int ldu1 = P;
    float u2[(M - P) * (M - P)] = { nan };
    lapack_int ldu2 = M - P;
    float v1t[Q * Q] = { nan };
    lapack_int ldv1t = Q;
    float work[256] = { nan };
    lapack_int lwork = sizeof(work) / sizeof(work[0]);
    lapack_int iwork[M + P + Q] = { -1 };
    lapack_int info = -1;

    char yes = 'Y';
    sorcsd2by1_(
        &yes,
        &yes,
        &yes,
        &m,
        &p,
        &q,
        x,
        &ldx11,
        x + p,
        &ldx21,
        theta,
        u1,
        &ldu1,
        u2,
        &ldu2,
        v1t,
        &ldv1t,
        work,
        &lwork,
        iwork,
        &info,
        1,
        1,
        1);

    for (size_t i = 0; i < Q; ++i) {
        for (size_t j = 0; j < Q; ++j) {
            const char* separator = (j + 1 < Q) ? " " : "\n";
            printf("%+9.2e%s", v1t[j * ldv1t + i], separator);
        }
    }
}

Avoid overflows when XNORM << ABS(ALPHA) << 1.

fixes Reference-LAPACK#934
@christoph-conrads christoph-conrads force-pushed the 917-sorcsd2by1-computes-inaccurate-result branch from 073b96c to 64897c4 Compare November 10, 2023 19:09
@christoph-conrads christoph-conrads marked this pull request as ready for review November 10, 2023 19:09
@langou
Copy link
Contributor

langou commented Nov 11, 2023

There is no alternative to fixing SLARFGP. This input creates NaNs in the V1 matrix computed by SORCSD2BY1.

Thanks @christoph-conrads. Do you have a recommendation on what to do? Your conclusion "There is no alternative to fixing SLARFGP." is close to where we ended up too a decade ago or so. I did not do the analysis myself, by I remember that the conclusion that no one knew how to fix SLARFGP. How critical is the "P" in SLARFGP? If it is the "P" much critical, can we do some steps of Gram-Schmidt instead? In general, I am not a big fan of having more Gram-Schmidt in LAPACK, but Gram-Schmidt is one way to have a positive diagonal. All in all, do you have a recommendation on what to do?

@christoph-conrads
Copy link
Contributor Author

christoph-conrads commented Nov 11, 2023

Do you have a recommendation on what to do?

The latest commit in this PR changes the criterion when X is considered zero:

-     IF( XNORM.EQ.ZERO ) THEN
+     IF( XNORM.LE.EPS*ABS(ALPHA) ) THEN

From the point of view of a programmer, there was simply no branch in the code dealing with an input XNORM ≪ ABS(ALPHA) ≪ 1. From the point of view of backward error analysis, an error is introduced in the input vector v that is less than ε‖v‖, where v = [ALPHA, X(1), X(2), ..., X(N-1)], and from this fact it should follow that the Householder matrix computation is still backward stable.

@christoph-conrads
Copy link
Contributor Author

christoph-conrads commented Nov 11, 2023

Another NaN but this one was harder to find (90 minutes instead of 90 seconds).

Execution:

$ gcc -Wextra -Wall -std=c99 -pedantic sorcsd2by1.c -L/tmp/tmp.E5Z0UNs1ZK/lib -llapack
$ env LD_LIBRARY_PATH=/tmp/tmp.E5Z0UNs1ZK/lib ./a.out 
-1.00e+00 +1.42e-14 -1.19e-07 +0.00e+00
+0.00e+00 +1.42e-14 +0.00e+00 -1.00e+00
-1.42e-14 -1.00e+00 -0.00e+00 -1.42e-14
     -nan      -nan      -nan      -nan

Code:

#include <math.h>
#include <stddef.h>
#include <stdio.h>

typedef int lapack_int;

void sorcsd2by1_(
    char* jobu1,
    char* jobu2,
    char* jobv1t,
    lapack_int* m,
    lapack_int* p,
    lapack_int* q,
    float* x11,
    lapack_int* ldx11,
    float* x21,
    lapack_int* ldx21,
    float* theta,
    float* u1,
    lapack_int* ldu1,
    float* u2,
    lapack_int* ldu2,
    float* v1t,
    lapack_int* ldv1t,
    float* work,
    lapack_int* lwork,
    lapack_int* iwork,
    lapack_int* info,
    size_t strlen_jobu1,
    size_t strlen_jobu2,
    size_t strlen_jobv1t);

#define M 6
#define P 3
#define Q 4

int main()
{
    lapack_int m = M;
    lapack_int p = P;
    lapack_int q = Q;
    float x[M * Q] = {
        -1.00000000,
        -0.00000000,
        -0.00000000,
        8.68563248E-08,
        1.83140187E-07,
        5.64973490E-08,
        -1.83140173E-07,
        0.00000000,
        -0.00000000,
        1.79512426E-07,
        -1.00000000,
        1.11148256E-14,
        -8.68563603E-08,
        -1.62776259E-29,
        0.00000000,
        -1.00000000,
        -1.79512398E-07,
        -2.86137259E-16,
        -5.64973490E-08,
        1.02350803E-16,
        2.73971536E-15,
        5.19329114E-15,
        -7.67890847E-16,
        -1.00000000,
    };
    lapack_int ldx11 = m;
    lapack_int ldx21 = m;
    float nan = NAN;
    float theta[Q] = { nan };
    float u1[P * P] = { nan };
    lapack_int ldu1 = P;
    float u2[(M - P) * (M - P)] = { nan };
    lapack_int ldu2 = M - P;
    float v1t[Q * Q] = { nan };
    lapack_int ldv1t = Q;
    float work[256] = { nan };
    lapack_int lwork = sizeof(work) / sizeof(work[0]);
    lapack_int iwork[M + P + Q] = { -1 };
    lapack_int info = -1;

    char yes = 'Y';
    sorcsd2by1_(
        &yes,
        &yes,
        &yes,
        &m,
        &p,
        &q,
        x,
        &ldx11,
        x + p,
        &ldx21,
        theta,
        u1,
        &ldu1,
        u2,
        &ldu2,
        v1t,
        &ldv1t,
        work,
        &lwork,
        iwork,
        &info,
        1,
        1,
        1);

    for (size_t i = 0; i < Q; ++i) {
        for (size_t j = 0; j < Q; ++j) {
            const char* separator = (j + 1 < Q) ? " " : "\n";
            printf("%+9.2e%s", v1t[j * ldv1t + i], separator);
        }
    }
}

@christoph-conrads
Copy link
Contributor Author

Your conclusion "There is no alternative to fixing SLARFGP." is close to where we ended up too a decade ago or so. I did not do the analysis myself, by I remember that the conclusion that no one knew how to fix SLARFGP. How critical is the "P" in SLARFGP?

Frankly I do not know which issue you are referring to. My comment was made after suggesting two possible fixes for the issues that I encountered here.

@langou
Copy link
Contributor

langou commented Nov 11, 2023

Hi @christoph-conrads,

Frankly I do not know which issue you are referring to. My comment was made after suggesting two possible fixes for the issues that I encountered #930 (comment).

Fair enough. I had a vague recollection and I do not know myself. I think I am adding confusion as opposed to help out, let us move on and discard my comments.

Thanks for suggesting some fixes for these issues.

@langou langou merged commit 6261d62 into Reference-LAPACK:master Nov 11, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SORCSD2BY1 computes inaccurate result
2 participants