-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make vector orthogonalization more reliable #930
Make vector orthogonalization more reliable #930
Conversation
Do not assume that the vector x passed to these subroutines has Euclidean norm one (the norm is almost surely smaller than one when xORBDB5/xUNBDB5 calls this subroutine for the first time). This commit fixes the occasional inaccurate results computed by xORCSD2BY1/xUNCSD2BY1.
With a modest amount of random testing on a work station, it is possible to detect the loss of orthogonality caused by the old early termination criterion.
* introduce a parameter representing a real zero for consistency * stop comparing real values with complex zero
Do not call xORBDB6/xUNBDB6 with input vectors x that are numerically zero because this can cause problems at the call sites (xLARFGP computations might underflow) leading to, e.g., the computation of nonunitary matrices in the xORCSD2BY1/xUNCSD2BY1 output.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #930 +/- ##
=======================================
Coverage 0.00% 0.00%
=======================================
Files 1918 1918
Lines 188653 188709 +56
=======================================
- Misses 188653 188709 +56
☔ View full report in Codecov by Sentry. |
The PR was converted to a draft since there are still infinities in the output with the SORBDB2 input matrix in the code below. The code demonstrates that the matrix has orthonormal columns. The input was generated from a pair of 2x10 and 3x10 input matrices to SGGQRCS. #!/usr/bin/python3
import ctypes
import ctypes.util
from ctypes import byref, c_char, c_float, c_int32, c_void_p, POINTER
import sys
import numpy as np
import numpy.linalg as la
def main():
x = np.array([
[-0.482012868, 0.869745135, 0.105863400],
[-0.00000000, 0.00000000, 0.00000000],
[-0.531500220, -0.194201767, -0.824495614],
[2.10694591E-08, -3.80177596E-08, -4.62756322E-09],
[0.696542203, 0.453684032, -0.555877328],
],
dtype=np.float32)
r = np.dot(x.T, x) - np.eye(3, dtype=np.float32)
eps = np.finfo(np.float32).eps
print(la.norm(r) / la.norm(x) / eps)
if __name__ == "__main__":
sys.exit(main()) |
The code at the end of this post calls SORCSD2BY1 with a special input triggering infinities in the computed $ gcc -Wextra -Wall -std=c99 -pedantic sorcsd2by1.c -L/tmp/tmp.mZJWTkULEs/lib -llapack
$ env LD_LIBRARY_PATH=/tmp/tmp.mZJWTkULEs/lib ./a.out
+0.00e+00 +1.00e+00 +0.00e+00
+1.00e+00 -0.00e+00 -inf
-inf -0.00e+00 -inf The infinities come into existence as follows:
The problem can be fixed in two ways:
This PR will scale the SORBDB6 input. This way the norm of the computed output vector is at least 0.83 ε and this will fix the issue based on the following reasoning. SLARFGP performs operations on an input vector
With these assumptions, SLARFGP performs the following computations (unfortunately, identifiers are re-used; signs will be ignored):
Then #include <math.h>
#include <stddef.h>
#include <stdio.h>
typedef int lapack_int;
void sorcsd2by1_(
char* jobu1,
char* jobu2,
char* jobv1t,
lapack_int* m,
lapack_int* p,
lapack_int* q,
float* x11,
lapack_int* ldx11,
float* x21,
lapack_int* ldx21,
float* theta,
float* u1,
lapack_int* ldu1,
float* u2,
lapack_int* ldu2,
float* v1t,
lapack_int* ldv1t,
float* work,
lapack_int* lwork,
lapack_int* iwork,
lapack_int* info,
size_t strlen_jobu1,
size_t strlen_jobu2,
size_t strlen_jobv1t);
#define M 5
#define P 2
#define Q 3
int main()
{
lapack_int m = M;
lapack_int p = P;
lapack_int q = Q;
float x[M * Q] = {
-0.482012868,
-0.00000000,
-0.531500220,
2.10694591E-08,
0.696542203,
0.869745135,
0.00000000,
-0.194201767,
-3.80177596E-08,
0.453684032,
0.105863400,
0.00000000,
-0.824495614,
-4.62756322E-09,
-0.555877328
};
lapack_int ldx11 = m;
lapack_int ldx21 = m;
float nan = NAN;
float theta[Q] = { nan };
float u1[P * P] = { nan };
lapack_int ldu1 = P;
float u2[(M - P) * (M - P)] = { nan };
lapack_int ldu2 = M - P;
float v1t[Q * Q] = { nan };
lapack_int ldv1t = Q;
float work[256] = { nan };
lapack_int lwork = sizeof(work) / sizeof(work[0]);
lapack_int iwork[M + P + Q] = { -1 };
lapack_int info = -1;
char yes = 'Y';
sorcsd2by1_(
&yes,
&yes,
&yes,
&m,
&p,
&q,
x,
&ldx11,
x + p,
&ldx21,
theta,
u1,
&ldu1,
u2,
&ldu2,
v1t,
&ldv1t,
work,
&lwork,
iwork,
&info,
1,
1,
1);
for (size_t i = 0; i < M - P; ++i) {
for (size_t j = 0; j < Q; ++j) {
const char* separator = (j + 1 < Q) ? " " : "\n";
printf("%+9.2e%s", u2[j * ldu2 + i], separator);
}
}
} |
Call sites may run into problems when this subroutine computes nonzero vectors that are very small in norm.
This is tracked in #934. |
There is no alternative to fixing SLARFGP. This input creates NaNs in the V1 matrix computed by SORCSD2BY1. Demonstration with the LAPACK build in $ gcc -Wextra -Wall -std=c99 -pedantic sorcsd2by1.c -L/tmp/tmp.mZJWTkULEs/lib/ -llapack
$ env LD_LIBRARY_PATH=/tmp/tmp.mZJWTkULEs/lib/ ./a.out
+nan +nan +nan
+nan +nan +nan
+nan +nan +nan #include <math.h>
#include <stddef.h>
#include <stdio.h>
typedef int lapack_int;
void sorcsd2by1_(
char* jobu1,
char* jobu2,
char* jobv1t,
lapack_int* m,
lapack_int* p,
lapack_int* q,
float* x11,
lapack_int* ldx11,
float* x21,
lapack_int* ldx21,
float* theta,
float* u1,
lapack_int* ldu1,
float* u2,
lapack_int* ldu2,
float* v1t,
lapack_int* ldv1t,
float* work,
lapack_int* lwork,
lapack_int* iwork,
lapack_int* info,
size_t strlen_jobu1,
size_t strlen_jobu2,
size_t strlen_jobv1t);
#define M 5
#define P 2
#define Q 3
int main()
{
lapack_int m = M;
lapack_int p = P;
lapack_int q = Q;
float x[M * Q] = {
-2.38418579E-07,
0.00000000,
0.00000000,
1.19904053E-14,
-1.00000000,
1.00000000,
0.00000000,
-2.98023188E-08,
-4.37113954E-08,
-2.38418579E-07,
-2.98023188E-08,
3.55271368E-15,
-1.00000000,
1.36304023E-15,
7.10542736E-15,
};
lapack_int ldx11 = m;
lapack_int ldx21 = m;
float nan = NAN;
float theta[Q] = { nan };
float u1[P * P] = { nan };
lapack_int ldu1 = P;
float u2[(M - P) * (M - P)] = { nan };
lapack_int ldu2 = M - P;
float v1t[Q * Q] = { nan };
lapack_int ldv1t = Q;
float work[256] = { nan };
lapack_int lwork = sizeof(work) / sizeof(work[0]);
lapack_int iwork[M + P + Q] = { -1 };
lapack_int info = -1;
char yes = 'Y';
sorcsd2by1_(
&yes,
&yes,
&yes,
&m,
&p,
&q,
x,
&ldx11,
x + p,
&ldx21,
theta,
u1,
&ldu1,
u2,
&ldu2,
v1t,
&ldv1t,
work,
&lwork,
iwork,
&info,
1,
1,
1);
for (size_t i = 0; i < Q; ++i) {
for (size_t j = 0; j < Q; ++j) {
const char* separator = (j + 1 < Q) ? " " : "\n";
printf("%+9.2e%s", v1t[j * ldv1t + i], separator);
}
}
} |
Avoid overflows when XNORM << ABS(ALPHA) << 1. fixes Reference-LAPACK#934
073b96c
to
64897c4
Compare
Thanks @christoph-conrads. Do you have a recommendation on what to do? Your conclusion "There is no alternative to fixing SLARFGP." is close to where we ended up too a decade ago or so. I did not do the analysis myself, by I remember that the conclusion that no one knew how to fix SLARFGP. How critical is the "P" in SLARFGP? If it is the "P" much critical, can we do some steps of Gram-Schmidt instead? In general, I am not a big fan of having more Gram-Schmidt in LAPACK, but Gram-Schmidt is one way to have a positive diagonal. All in all, do you have a recommendation on what to do? |
The latest commit in this PR changes the criterion when - IF( XNORM.EQ.ZERO ) THEN
+ IF( XNORM.LE.EPS*ABS(ALPHA) ) THEN From the point of view of a programmer, there was simply no branch in the code dealing with an input |
Another NaN but this one was harder to find (90 minutes instead of 90 seconds). Execution: $ gcc -Wextra -Wall -std=c99 -pedantic sorcsd2by1.c -L/tmp/tmp.E5Z0UNs1ZK/lib -llapack
$ env LD_LIBRARY_PATH=/tmp/tmp.E5Z0UNs1ZK/lib ./a.out
-1.00e+00 +1.42e-14 -1.19e-07 +0.00e+00
+0.00e+00 +1.42e-14 +0.00e+00 -1.00e+00
-1.42e-14 -1.00e+00 -0.00e+00 -1.42e-14
-nan -nan -nan -nan Code: #include <math.h>
#include <stddef.h>
#include <stdio.h>
typedef int lapack_int;
void sorcsd2by1_(
char* jobu1,
char* jobu2,
char* jobv1t,
lapack_int* m,
lapack_int* p,
lapack_int* q,
float* x11,
lapack_int* ldx11,
float* x21,
lapack_int* ldx21,
float* theta,
float* u1,
lapack_int* ldu1,
float* u2,
lapack_int* ldu2,
float* v1t,
lapack_int* ldv1t,
float* work,
lapack_int* lwork,
lapack_int* iwork,
lapack_int* info,
size_t strlen_jobu1,
size_t strlen_jobu2,
size_t strlen_jobv1t);
#define M 6
#define P 3
#define Q 4
int main()
{
lapack_int m = M;
lapack_int p = P;
lapack_int q = Q;
float x[M * Q] = {
-1.00000000,
-0.00000000,
-0.00000000,
8.68563248E-08,
1.83140187E-07,
5.64973490E-08,
-1.83140173E-07,
0.00000000,
-0.00000000,
1.79512426E-07,
-1.00000000,
1.11148256E-14,
-8.68563603E-08,
-1.62776259E-29,
0.00000000,
-1.00000000,
-1.79512398E-07,
-2.86137259E-16,
-5.64973490E-08,
1.02350803E-16,
2.73971536E-15,
5.19329114E-15,
-7.67890847E-16,
-1.00000000,
};
lapack_int ldx11 = m;
lapack_int ldx21 = m;
float nan = NAN;
float theta[Q] = { nan };
float u1[P * P] = { nan };
lapack_int ldu1 = P;
float u2[(M - P) * (M - P)] = { nan };
lapack_int ldu2 = M - P;
float v1t[Q * Q] = { nan };
lapack_int ldv1t = Q;
float work[256] = { nan };
lapack_int lwork = sizeof(work) / sizeof(work[0]);
lapack_int iwork[M + P + Q] = { -1 };
lapack_int info = -1;
char yes = 'Y';
sorcsd2by1_(
&yes,
&yes,
&yes,
&m,
&p,
&q,
x,
&ldx11,
x + p,
&ldx21,
theta,
u1,
&ldu1,
u2,
&ldu2,
v1t,
&ldv1t,
work,
&lwork,
iwork,
&info,
1,
1,
1);
for (size_t i = 0; i < Q; ++i) {
for (size_t j = 0; j < Q; ++j) {
const char* separator = (j + 1 < Q) ? " " : "\n";
printf("%+9.2e%s", v1t[j * ldv1t + i], separator);
}
}
} |
Frankly I do not know which issue you are referring to. My comment was made after suggesting two possible fixes for the issues that I encountered here. |
Fair enough. I had a vague recollection and I do not know myself. I think I am adding confusion as opposed to help out, let us move on and discard my comments. Thanks for suggesting some fixes for these issues. |
Description
The list below only mentions the real implementations of subroutines but both real and complex implementations are meant.
This PR improves xORBDB5 and xORBDB6:
ALPHA
in xORBDB6.REALZERO
parameter in xORBDB5.fixes #917
Checklist