romtools.linalg.linalg#
see this for why this file exists and is done this way https://stackoverflow.com/questions/47599162/pybind11-how-to-package-c-and-python-code-into-a-single-package?rq=1
Functions
|
Return the index of an array's maximum value. |
|
Return the maximum of a possibly distributed array or maximum along an axis. |
|
Return the mean of a possibly distributed array over a given axis. |
|
Return the minimum of a possibly distributed array or minimum along an axis. |
Gathers a distributed linear system (A, b) from multiple MPI ranks to rank 0. |
|
|
Computes the pseudoinverse of A and returns its transpose. |
|
Computes C = beta*C + alpha*op(A)*op(B), where A and B are row-distributed matrices. |
|
Return the standard deviation of a possibly distributed array over a given axis. |
|
Preconditions: |
- romtools.linalg.linalg.argmax(a, comm=None)#
Return the index of an array’s maximum value. If the array is distributed, also returns the value itself and the MPI rank on which it occurs.
- Parameters:
a (np.ndarray) – input data
comm (MPI_Comm) – MPI communicator (default: None)
- Returns:
if comm == None, returns the index of the maximum value (identical to np.argmax) if comm != None, returns a tuple containing (value, index, rank):
value: the global maximum index: the local index of the global maximum rank: the rank on which the global maximum resides
- Preconditions:
a is at most a rank-3 tensor
if a is a distributed 2-D array, it must be distributed along axis=0, and every rank must have the same a.shape[1]
if a is a distributed 3-D tensor, it must be distributed along axis=1, and every rank must have the same a.shape[0] and a.shape[2]
- Postconditions:
a and comm are not modified
Example 1:#
- rank 0 2.2
3.3
- rank 1 40.
51. -24. 45.
rank 2 -4.
Suppose that we do:
res = la.argmax(a, comm)
then ALL ranks will contain res = (1, 1). (The global maximum (51.) occurs at index 1 of the local array on Rank 1.)
Example 2:#
- rank 0 2.2 1.3 4.
3.3 5.0 33.
- rank 1 40. -2. -4.
51. 4. 6. -24. 8. 9. 45. -3. -4.
rank 2 -4. 8. 9.
Suppose that we do:
res = la.argmax(a, comm)
then ALL ranks will contain res = (3, 1) (The global maximum (51.) occurs at index 3 of the flattened local array on Rank 1.)
Example 3:#
/ 3. 4. / 2. 8. 2. 1. / 2.
/ 6. -1. / -2. -1. 0. -6. / 0. -> slice T(:,:,1)
/ -7. 5. / 5. 0. 3. 1. / 3.
|-----------|———————-|——– | 2. 3. | 4. 5. -2. 4. | -4. | 1. 5. | -2. 4. 8. -3. | 8. -> slice T(:,:,0) | 4. 3. | -4. 6. 9. -4. | 9.
r0 r1 r2
Suppose that we do:
res = la.argmax(a, comm)
then ALL ranks will contain res = (20, 1) (The global maximum (9.) occurs on both Rank 1 and Rank 2, but we automatically return the index on the lowest rank. In this case, that is index 20 of the flattened local array on Rank 1.)
- romtools.linalg.linalg.max(a, axis=None, comm=None)#
Return the maximum of a possibly distributed array or maximum along an axis.
- Parameters:
a (np.ndarray) – input data
axis (None or int) – the axis along which to compute the maximum. If None, computes the max of the flattened array. (default: None)
comm (MPI_Comm) – MPI communicator (default: None)
- Returns:
if axis == None, returns a scalar if axis is not None, returns an array of dimension a.ndim - 1
- Preconditions:
a is at most a rank-3 tensor
if a is a distributed 2-D array, it must be distributed along axis=0, and every rank must have the same a.shape[1]
if a is a distributed 3-D tensor, it must be distributed along axis=1, and every rank must have the same a.shape[0] and a.shape[2]
if axis != None, then it must be an int
- Postconditions:
a and comm are not modified
Example 1:#
- rank 0 2.2
3.3
- rank 1 40.
51. -24. 45.
rank 2 -4.
res = la.max(a, comm) then ALL ranks will contain res = 51.
Example 2:#
- rank 0 2.2 1.3 4.
3.3 5.0 33.
- rank 1 40. -2. -4.
51. 4. 6. -24. 8. 9. 45. -3. -4.
rank 2 -4. 8. 9.
Suppose that we do:
res = la.max(a, axis=0, comm)
then every rank will contain the same res which is an array = ([51., 8., 33]) this is because the max is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation.
Suppose that we do:
res = la.max(a, axis=1, comm)
then res is now a rank-1 array as follows
- rank 0 4.
- rank 1 40.
51. 9. 45.
rank 2 9.
because the axis queried for the max is NOT a distributed axis so this operation is purely local and the result has the same distribution as the original array.
Example 3:#
/ 3. 4. / 2. 8. 2. 1. / 2.
/ 6. -1. / -2. -1. 0. -6. / 0. -> slice T(:,:,1)
/ -7. 5. / 5. 0. 3. 1. / 3.
|-----------|———————-|——– | 2. 3. | 4. 5. -2. 4. | -4. | 1. 5. | -2. 4. 8. -3. | 8. -> slice T(:,:,0) | 4. 3. | -4. 6. 9. -4. | 9.
r0 r1 r2
Suppose that we do:
res = la.max(a, axis=0, comm)
then res is now a rank-2 array as follows:
/ 6. 5. / 5. 8. 3. 1. / 3.
/ 4. 5. / 4. 6. 9. 4. / 9.
/ / /
/ r1 / r2 / r3
because the axis queried for the max is NOT a distributed axis and this is effectively a reduction over the 0-th axis so this operation is purely local and the result has the same distribution as the original array.
Suppose that we do:
res = la.max(a, axis=1, comm)
then this is effectively a reduction over axis=1, and every rank will contain the same res which is a rank-2 array as follows
5. 8. 8. 6. 9. 5.
this is because the max is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation and we know that memory-wise it is feasible to hold because this is no larger than the local allocation on each rank.
Suppose that we do:
res = la.max(a, axis=2, comm)
then res is now a rank-2 array as follows
- r0 || r1 || r2
|| ||
3. 4. || 4. 8. 2. 4. || 2. 6. 5. || -2. 4. 8. -3. || 8. 4. 5. || 5. 6. 9. 1. || 9.
|| ||
because the axis queried for the max is NOT a distributed axis and this is effectively a reduction over the 2-th axis so this operation is purely local and the result has the same distribution as the original array.
- romtools.linalg.linalg.mean(a, dtype=None, axis=None, comm=None)#
Return the mean of a possibly distributed array over a given axis.
- Parameters:
a (np.ndarray) – input data
dtype (data-type) – Type to use in computing the mean
axis (None or int) – the axis along which to compute the mean. If None, computes the mean of the flattened array.
comm (MPI_Comm) – MPI communicator (default: None)
- Returns:
if axis == None, returns a scalar if axis is not None, returns an array of dimension a.ndim - 1
- Preconditions:
a is at most a rank-3 tensor
if a is a distributed 2-D array, it must be distributed along axis=0, and every rank must have the same a.shape[1]
if a is a distributed 3-D tensor, it must be distributed along axis=1, and every rank must have the same a.shape[0] and a.shape[2]
if axis != None, then it must be an int
- Postconditions:
a and comm are not modified
Example 1:#
- rank 0 2.2
3.3
- rank 1 40.
51. -24. 45.
rank 2 -4.
res = la.mean(a, comm) then ALL ranks will contain res = 16.21
Example 2:#
- rank 0 2.2 1.3 4.
3.3 5.0 33.
- rank 1 40. -2. -4.
51. 4. 6. -24. 8. 9. 45. -3. -4.
rank 2 -4. 8. 9.
Suppose that we do:
res = la.mean(a, axis=0, comm)
then every rank will contain the same res which is:
res = ([16.21, 3.04, 7.57])
this is because the mean is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation.
Suppose that we do:
res = la.mean(a, axis=1, comm)
then res is now a rank-1 array as follows
- rank 0 2.5
13.77
- rank 1 11.33
20.33 -2.33 12.67
rank 2 4.33
because the axis queried for the mean is NOT a distributed axis so this operation is purely local and the result has the same distribution as the original array.
Example 3:#
/ 3. 4. / 2. 8. 2. 1. / 2.
/ 6. -1. / -2. -1. 0. -6. / 0. -> slice T(:,:,1)
/ -7. 5. / 5. 0. 3. 1. / 3.
|-----------|———————-|——– | 2. 3. | 4. 5. -2. 4. | -4. | 1. 5. | -2. 4. 8. -3. | 8. -> slice T(:,:,0) | 4. 3. | -4. 6. 9. -4. | 9.
r0 r1 r2
Suppose that we do:
res = la.mean(a, axis=0, comm)
then res is now a rank-2 array as follows:
/ 0.6667 2.6667 / 1.6667 2.3333 1.6667 -1.3333 / 1.6667
/ 2.3333 3.6667 / -0.6667. 5. 5. -1. / 4.3333
/ / /
/ r1 / r2 / r3
because the axis queried for the mean is NOT a distributed axis and this is effectively a reduction over the 0-th axis so this operation is purely local and the result has the same distribution as the original array.
Suppose that we do:
res = la.mean(a, axis=1, comm)
then this is effectively a reduction over axis=1, and every rank will contain the same res which is a rank-2 array as follows
1.71428571 3.1428571 3. -0.5714285 3.28571429 1.4285714
this is because the mean is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation and we know that memory-wise it is feasible to hold because this is no larger than the local allocation on each rank.
Suppose that we do:
res = la.mean(a, axis=2, comm)
then res is now a rank-2 array as follows
- r0 || r1 || r2
|| ||
2.5 3.5 || 3. 6.5 0. 2.5 || -1. 3.5 2. || -2. 1.5 4. -4.5 || 4.
- -1.5 4. || 0.5 3. 6. -1.5 || 6.
|| ||
because the axis queried for the mean is NOT a distributed axis and this is effectively a reduction over the 2-th axis so this operation is purely local and the result has the same distribution as the original array.
- romtools.linalg.linalg.min(a, axis=None, comm=None)#
Return the minimum of a possibly distributed array or minimum along an axis.
- Parameters:
a (np.ndarray) – input data
axis (None or int) – the axis along which to compute the minimum. If None, computes the min of the flattened array.
comm (MPI_Comm) – MPI communicator
- Returns:
if axis == None, returns a scalar if axis is not None, returns an array of dimension a.ndim - 1
- Preconditions:
a is at most a rank-3 tensor
if a is a distributed 2-D array, it must be distributed along axis=0, and every rank must have the same a.shape[1]
if a is a distributed 3-D tensor, it must be distributed along axis=1, and every rank must have the same a.shape[0] and a.shape[2]
if axis != None, then it must be an int
- Postconditions:
a and comm are not modified
Example 1:#
- rank 0 2.2
3.3
- rank 1 40.
51. -24. 45.
rank 2 -4.
res = la.min(a, comm) then ALL ranks will contain res = -4.
Example 2:#
- rank 0 2.2 1.3 4.
3.3 5.0 33.
- rank 1 40. -2. -4.
51. 4. 6. -24. 8. 9. 45. -3. -4.
rank 2 -4. 8. 9.
Suppose that we do:
res = la.min(a, axis=0, comm)
then every rank will contain the same res which is an array = ([-24., -3., -4]) this is because the min is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation.
Suppose that we do:
res = la.min(a, axis=1, comm)
then res is now a rank-1 array as follows
- rank 0 1.3
3.3
- rank 1 -4.
4. -24. -4.
rank 2 -4.
because the axis queried for the min is NOT a distributed axis so this operation is purely local and the result has the same distribution as the original array.
Example 3:#
/ 3. 4. / 2. 8. 2. 1. / 2.
/ 6. -1. / -2. -1. 0. -6. / 0. -> slice T(:,:,1)
/ -7. 5. / 5. 0. 3. 1. / 3.
|-----------|———————-|——– | 2. 3. | 4. 5. -2. 4. | -4. | 1. 5. | -2. 4. 8. -3. | 8. -> slice T(:,:,0) | 4. 3. | -4. 6. 9. -4. | 9.
r0 r1 r2
Suppose that we do:
res = la.max(a, axis=0, comm)
then res is now a rank-2 array as follows:
/ -7. -1. / -2. -1. 0. -6. / 0.
/ 1. 3. / -4. 4. -2. -4. / -4.
/ / /
/ r1 / r2 / r3
because the axis queried for the max is NOT a distributed axis and this is effectively a reduction over the 0-th axis so this operation is purely local and the result has the same distribution as the original array.
Suppose that we do:
res = la.max(a, axis=1, comm)
then this is effectively a reduction over axis=1, and every rank will contain the same res which is a rank-2 array as follows
-4. 1. -3. -6. -4. -7.
this is because the max is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation and we know that memory-wise it is feasible to hold because this is no larger than the local allocation on each rank.
Suppose that we do:
res = la.max(a, axis=2, comm)
then res is now a rank-2 array as follows
- r0 || r1 || r2
|| ||
2. 3. || 2. 5. -2. 1. || -4. 1. -1. || -2. -1. 0. -6. || 0.
- -7. 3. || -4. 0. 3. -4. || 3.
|| ||
because the axis queried for the max is NOT a distributed axis and this is effectively a reduction over the 2-th axis so this operation is purely local and the result has the same distribution as the original array.
- romtools.linalg.linalg.move_distributed_linear_system_to_rank_zero(A_in, b_in, comm)[source]#
Gathers a distributed linear system (A, b) from multiple MPI ranks to rank 0.
- Preconditions:
A_in is a rank-2 tensor (2D array) representing a portion of the global matrix A.
b_in is a rank-1 or rank-2 tensor (1D or 2D array) representing a portion of the global vector b.
A_in and b_in are distributed row-wise across MPI ranks.
- Returns:
The global matrix A assembled on rank 0. - b_g (numpy.ndarray): The global vector b assembled on rank 0.
- Return type:
A_g (numpy.ndarray)
- Parameters:
A_in (ndarray)
b_in (ndarray)
- Postconditions:
On rank 0, A_g and b_g contain the fully assembled matrix and vector, respectively.
On other ranks, A_g and b_g are dummy arrays with no meaningful content.
The input arrays A_in and b_in are not modified.
Notes
The function ensures that all data is gathered without additional copies or unnecessary data movement.
Only rank 0 ends up with the complete system; other ranks have placeholder arrays.
- romtools.linalg.linalg.pinv(A, comm=None)#
Computes the pseudoinverse of A and returns its transpose. Note that returning the transpose(A^+) is because of convenience. In fact, when A is row-distributed and comm is not None, then the result has the same distribution of A. If the matrix A is too large, this is the only feasble way to store the pseudoinverse since no single rank can fully store it.
- Parameters:
A (-) – input matrix
comm (-) – MPI communicator (default: None)
- Returns:
(A^+)^T = A (A^T A)^(-1)^T
- Return type:
The transpose of A^+ computed as
- Preconditions:
A must be a real, rank-2 matrix
A must have more rows than columns
A must have linearly independent columns
If A is distributed, it must be so along its rows
- Post-conditions:
A and comm are not modified
A^+ A = I
- romtools.linalg.linalg.product(flagA, flagB, alpha, A, B, beta, C, comm=None)#
Computes C = beta*C + alpha*op(A)*op(B), where A and B are row-distributed matrices.
- Parameters:
flagA (str) – Determines the orientation of A, “T” for transpose or “N” for non-transpose.
flagB (str) – Determines the orientation of B, “T” for transpose or “N” for non-transpose.
alpha (float) – Coefficient of AB.
A (np.array) – 2-D matrix
B (np.array) – 2-D matrix
beta (float) – Coefficient of C.
C (np.array) – 2-D matrix to be filled with the product
comm (MPI_Comm) – MPI communicator (default: None)
- Returns:
The specified product
- Return type:
C (np.array)
- romtools.linalg.linalg.std(a, dtype=None, axis=None, comm=None)#
Return the standard deviation of a possibly distributed array over a given axis.
- Parameters:
a (np.ndarray) – input data
dtype (data-type) – Type to use in computing the mean
axis (None or int) – the axis along which to compute the mean. If None, computes the mean of the flattened array.
comm (MPI_Comm) – MPI communicator (default: None)
- Returns:
if axis == None, returns a scalar if axis is not None, returns an array of dimension a.ndim - 1
- Preconditions:
a is at most a rank-3 tensor
if a is a distributed 2-D array, it must be distributed along axis=0, and every rank must have the same a.shape[1]
if a is a distributed 3-D tensor, it must be distributed along axis=1, and every rank must have the same a.shape[0] and a.shape[2]
if axis != None, then it must be an int
- Postconditions:
a and comm are not modified
Example 1:#
- rank 0 2.2
3.3
- rank 1 40.
51. -24. 45.
rank 2 -4.
res = la.std(a, comm) then ALL ranks will contain res = 26.71
Example 2:#
- rank 0 2.2 1.3 4.
3.3 5.0 33.
- rank 1 40. -2. -4.
51. 4. 6. -24. 8. 9. 45. -3. -4.
rank 2 -4. 8. 9.
Suppose that we do:
res = la.std(a, axis=0, comm)
then every rank will contain the same res which is:
res = ([26.71, 4.12 , 11.55])
this is because the standard deviation is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation.
Suppose that we do:
res = la.std(a, axis=1, comm)
then res is now a rank-1 array as follows
- rank 0 1.12
13.62
- rank 1 20.29
21.70 15.33 22.87
rank 2 5.91
because the axis queried for the standard deviation is NOT a distributed axis so this operation is purely local and the result has the same distribution as the original array.
Example 3:#
/ 3. 4. / 2. 8. 2. 1. / 2.
/ 6. -1. / -2. -1. 0. -6. / 0. -> slice T(:,:,1)
/ -7. 5. / 5. 0. 3. 1. / 3.
|-----------|———————-|——– | 2. 3. | 4. 5. -2. 4. | -4. | 1. 5. | -2. 4. 8. -3. | 8. -> slice T(:,:,0) | 4. 3. | -4. 6. 9. -4. | 9.
r0 r1 r2
Suppose that we do:
res = la.std(a, axis=0, comm)
then res is now a rank-2 array as follows:
/ 5.5578 2.6247 / 2.8674 4.0277 1.2472 3.2998 / 1.2472
/ 1.2472 0.9428 / 3.3993 0.8165 4.9666 3.5590 / 5.9067
/ / /
/ r1 / r2 / r3
because the axis queried for the standard deviation is NOT a distributed axis and this is effectively a reduction over the 0-th axis so this operation is purely local and the result has the same distribution as the original array.
Suppose that we do:
res = la.std(a, axis=1, comm)
then this is effectively a reduction over axis=1, and every rank will contain the same res which is a rank-2 array as follows
3.14934396 2.16653584 4.14039336 3.28881841 5.06287004 3.84919817
this is because the standard deviation is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation and we know that memory-wise it is feasible to hold because this is no larger than the local allocation on each rank.
Suppose that we do:
res = la.std(a, axis=2, comm)
then res is now a rank-2 array as follows
- r0 || r1 || r2
|| ||
0.5 0.5 || 1. 1.5 2. 1.5 || 3. 2.5 3. || 0. 2.5 4. 1.5 || 4. 5.5 1. || 4.5 3. 3. 2.5 || 3.
|| ||
because the axis queried for the standard deviation is NOT a distributed axis and this is effectively a reduction over the 2-th axis so this operation is purely local and the result has the same distribution as the original array.
- romtools.linalg.linalg.thin_svd(M, comm=None, method='auto')#
- Preconditions:
M is rank-2 tensor
if M is distributed, M is distributed over its 0-th axis (row distribution)
allowed choices for method are “auto”, “method_of_snapshots”
- Returns:
left singular vectors and singular values
- Postconditions:
M is not modified
if M is distributed, the left singular vectors have the same distributions