forked from elemental/Elemental
-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO
112 lines (101 loc) · 4.71 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
Each of the following categories lists goals in an order which roughly
corresponds to the order in which they are hoped to be added.
Items are marked using the following code:
[x] ~ planned to be finished before the next release
[o] ~ hopefully started in the near future
[-] ~ marked for eventual development
Functionality priorities
========================
Fundamental functionality additions
-----------------------------------
[o] Second-order and semidefinite cone support
[o] Non-naive Lanczos
[o] 2D sparse matrix distributions
[o] Matrix type tags for, for example, merging {Gemm,Hemm,Trmm,etc.} into "*"
[o] Estimate for spectral radius
[o] Low-rank modifications of QR
[-] Successive Band Reduction
[-] Windowed QR with column pivoting
[-] Power-method-like p-norm estimation
[-] More work on (generalized) Spectral Divide and Conquer Schur decompositions
[-] QL factorization and ql::SolveAfter
[-] Strong RRQR and RRLQ
[-] CUR decompositions (already have (pseudo-)skeleton)
[-] Complete Orthogonal Decompositions (especially URV)
[-] LU and LDL with rook pivoting
[-] (Blocked) Aasen's
[-] TSQR for non-powers-of-two
[-] TSLU (via tournament pivoting)
[-] Native nonsymmetric (generalized) eigensolver via QR (QZ) algorithm
[-] Generalized Sylvester equations
Incremental functionality improvements
--------------------------------------
[o] General redistribution routine between any two matrices with any
process grids (with equivalent viewing communicators)
[o] Add Bunch-Kaufman C now that explicit permutations are used
[o] Extend Grid class to support mappings from, e.g., (MDRank,root) -> VCRank
and use these mappings to build an (owner,root) -> VCRank mapping for
[Block]DistMatrix
[o] Rescaled multi-shift Hessenberg solves
[o] Blocked algorithms for low-rank Cholesky updates
[o] Relative interval subset computation for HermitianEig (i.e., in [-1,1])
[o] Sequential blocked reduction to tridiagonal form
[o] Quadratic-time Haar generation via random Householder reflectors
[o] Banded Cholesky factorization
[o] QR with full pivoting (Businger-Golub plus row-sorting or row-pivoting)
[-] 'Control' equivalents to 'Attach' for DistMatrix, and ability to forfeit
buffers in (Dist)Matrix
[-] Axpy interface implementation using one-sided communication
[-] Square process grid specializations of LDL and Bunch-Kaufman
[-] Businger-esque element-growth monitoring in GEPP and Bunch-Kaufman
[-] More Sign algorithms (switch to Newton-Schulz near convergence)
[-] Way for DistMatrix with single process to view Matrix, and operator=
[-] Various approaches (e.g., HJS) for parallel tridiagonalization
[-] Wrappers for more LAPACK eigensolvers
[-] Sequential versions of Trr2k
[-] More explicit expansions of packed Householder reflectors
[-] More Trtrmm/Trtrsm routines
[-] Compressed pseudoinverse solves which avoid unnecessary backtransformations
[-] Additional CIRC distributions, e.g., (MC,CIRC)
Performance priorities
======================
[o] Accelerator support for local Gemm calls
[o] Support for BLIS and fused Trmv's to accelerate HermitianEig
[-] Optimized version of ApplySymmetricPivots
[-] Exploit structure in matrix sign based control solvers
Maintenance priorities
======================
Bug avoidance
-------------
Instrumentation/visualization/testing
-------------------------------------
[-] Global command-line options which are automatic for every driver, e.g.,
"--colMajor <true/false>" for column-major process grids and
"--nb <blocksize>" for the algorithmic blocksize
[-] Means of easily tracking/plotting heap memory usage over time
[-] Provide way to zoom in/out and add colorbar to DisplayWidget
[-] Better organization of test matrices into relevant classes, e.g., Hermitian,
normal, triangular, Hessenberg, etc., so that each test driver can easily
test each member from that class.
Consistency/modularity
----------------------
[-] Modify Grid to return communicators based upon the distribution, e.g.,
Comm(VC)?
[-] Extract BLAS/LAPACK/MPI wrappers into a separate project
[-] Make transpose-options of LocalTrr(2)k more consistent with Trr(2)k
[-] Consistent implementation of unblocked routines
[-] Safe down-casting of integers in BLAS/LAPACK calls
External Interfaces
-------------------
[o] Expose BlockDistMatrix to C and Python (as well as ScaLAPACK interface)
Build system
------------
[o] Attempt to shorten build via extern templates
MPI and Threading
-----------------
[-] Implement message-splitting in collectives for count > 2^31
[-] Use MPI contiguous datatype for all messages with count > 2^31
(may not work with older MPIs)
[-] Detect oversubsription using sysconf/sysctl and {OMP,MKL,*}_NUM_THREADS
[-] Add MPI wrappers for all nonblocking collectives
[-] Add MPI wrappers for RMA