romtools.linalg.linalg#

see this for why this file exists and is done this way https://stackoverflow.com/questions/47599162/pybind11-how-to-package-c-and-python-code-into-a-single-package?rq=1

Functions

argmax(a[, comm])

Return the index of an array's maximum value.

max(a[, axis, comm])

Return the maximum of a possibly distributed array or maximum along an axis.

mean(a[, dtype, axis, comm])

Return the mean of a possibly distributed array over a given axis.

min(a[, axis, comm])

Return the minimum of a possibly distributed array or minimum along an axis.

move_distributed_linear_system_to_rank_zero(...)

Gathers a distributed linear system (A, b) from multiple MPI ranks to rank 0.

pinv(A[, comm])

Computes the pseudoinverse of A and returns its transpose.

product(flagA, flagB, alpha, A, B, beta, C)

Computes C = beta*C + alpha*op(A)*op(B), where A and B are row-distributed matrices.

std(a[, dtype, axis, comm])

Return the standard deviation of a possibly distributed array over a given axis.

thin_svd(M[, comm, method])

Preconditions:

romtools.linalg.linalg.argmax(a, comm=None)#

Return the index of an array’s maximum value. If the array is distributed, also returns the value itself and the MPI rank on which it occurs.

Parameters:
  • a (np.ndarray) – input data

  • comm (MPI_Comm) – MPI communicator (default: None)

Returns:

if comm == None, returns the index of the maximum value (identical to np.argmax) if comm != None, returns a tuple containing (value, index, rank):

value: the global maximum index: the local index of the global maximum rank: the rank on which the global maximum resides

Preconditions:
  • a is at most a rank-3 tensor

  • if a is a distributed 2-D array, it must be distributed along axis=0, and every rank must have the same a.shape[1]

  • if a is a distributed 3-D tensor, it must be distributed along axis=1, and every rank must have the same a.shape[0] and a.shape[2]

Postconditions:
  • a and comm are not modified

Example 1:#

rank 0 2.2

3.3

rank 1 40.

51. -24. 45.

rank 2 -4.

Suppose that we do:

res = la.argmax(a, comm)

then ALL ranks will contain res = (1, 1). (The global maximum (51.) occurs at index 1 of the local array on Rank 1.)

Example 2:#

rank 0 2.2 1.3 4.

3.3 5.0 33.

rank 1 40. -2. -4.

51. 4. 6. -24. 8. 9. 45. -3. -4.

rank 2 -4. 8. 9.

Suppose that we do:

res = la.argmax(a, comm)

then ALL ranks will contain res = (3, 1) (The global maximum (51.) occurs at index 3 of the flattened local array on Rank 1.)

Example 3:#

/ 3. 4. / 2. 8. 2. 1. / 2.

/ 6. -1. / -2. -1. 0. -6. / 0. -> slice T(:,:,1)

/ -7. 5. / 5. 0. 3. 1. / 3.

|-----------|———————-|——– | 2. 3. | 4. 5. -2. 4. | -4. | 1. 5. | -2. 4. 8. -3. | 8. -> slice T(:,:,0) | 4. 3. | -4. 6. 9. -4. | 9.

r0 r1 r2

Suppose that we do:

res = la.argmax(a, comm)

then ALL ranks will contain res = (20, 1) (The global maximum (9.) occurs on both Rank 1 and Rank 2, but we automatically return the index on the lowest rank. In this case, that is index 20 of the flattened local array on Rank 1.)

romtools.linalg.linalg.max(a, axis=None, comm=None)#

Return the maximum of a possibly distributed array or maximum along an axis.

Parameters:
  • a (np.ndarray) – input data

  • axis (None or int) – the axis along which to compute the maximum. If None, computes the max of the flattened array. (default: None)

  • comm (MPI_Comm) – MPI communicator (default: None)

Returns:

if axis == None, returns a scalar if axis is not None, returns an array of dimension a.ndim - 1

Preconditions:
  • a is at most a rank-3 tensor

  • if a is a distributed 2-D array, it must be distributed along axis=0, and every rank must have the same a.shape[1]

  • if a is a distributed 3-D tensor, it must be distributed along axis=1, and every rank must have the same a.shape[0] and a.shape[2]

  • if axis != None, then it must be an int

Postconditions:
  • a and comm are not modified

Example 1:#

rank 0 2.2

3.3

rank 1 40.

51. -24. 45.

rank 2 -4.

res = la.max(a, comm) then ALL ranks will contain res = 51.

Example 2:#

rank 0 2.2 1.3 4.

3.3 5.0 33.

rank 1 40. -2. -4.

51. 4. 6. -24. 8. 9. 45. -3. -4.

rank 2 -4. 8. 9.

Suppose that we do:

res = la.max(a, axis=0, comm)

then every rank will contain the same res which is an array = ([51., 8., 33]) this is because the max is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation.

Suppose that we do:

res = la.max(a, axis=1, comm)

then res is now a rank-1 array as follows

rank 0 4.
rank 1 40.

51. 9. 45.

rank 2 9.

because the axis queried for the max is NOT a distributed axis so this operation is purely local and the result has the same distribution as the original array.

Example 3:#

/ 3. 4. / 2. 8. 2. 1. / 2.

/ 6. -1. / -2. -1. 0. -6. / 0. -> slice T(:,:,1)

/ -7. 5. / 5. 0. 3. 1. / 3.

|-----------|———————-|——– | 2. 3. | 4. 5. -2. 4. | -4. | 1. 5. | -2. 4. 8. -3. | 8. -> slice T(:,:,0) | 4. 3. | -4. 6. 9. -4. | 9.

r0 r1 r2

Suppose that we do:

res = la.max(a, axis=0, comm)

then res is now a rank-2 array as follows:

/ 6. 5. / 5. 8. 3. 1. / 3.

/ 4. 5. / 4. 6. 9. 4. / 9.

/ / /

/ r1 / r2 / r3

because the axis queried for the max is NOT a distributed axis and this is effectively a reduction over the 0-th axis so this operation is purely local and the result has the same distribution as the original array.

Suppose that we do:

res = la.max(a, axis=1, comm)

then this is effectively a reduction over axis=1, and every rank will contain the same res which is a rank-2 array as follows

5. 8. 8. 6. 9. 5.

this is because the max is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation and we know that memory-wise it is feasible to hold because this is no larger than the local allocation on each rank.

Suppose that we do:

res = la.max(a, axis=2, comm)

then res is now a rank-2 array as follows

r0 || r1 || r2

|| ||

3. 4. || 4. 8. 2. 4. || 2. 6. 5. || -2. 4. 8. -3. || 8. 4. 5. || 5. 6. 9. 1. || 9.

|| ||

because the axis queried for the max is NOT a distributed axis and this is effectively a reduction over the 2-th axis so this operation is purely local and the result has the same distribution as the original array.

romtools.linalg.linalg.mean(a, dtype=None, axis=None, comm=None)#

Return the mean of a possibly distributed array over a given axis.

Parameters:
  • a (np.ndarray) – input data

  • dtype (data-type) – Type to use in computing the mean

  • axis (None or int) – the axis along which to compute the mean. If None, computes the mean of the flattened array.

  • comm (MPI_Comm) – MPI communicator (default: None)

Returns:

if axis == None, returns a scalar if axis is not None, returns an array of dimension a.ndim - 1

Preconditions:
  • a is at most a rank-3 tensor

  • if a is a distributed 2-D array, it must be distributed along axis=0, and every rank must have the same a.shape[1]

  • if a is a distributed 3-D tensor, it must be distributed along axis=1, and every rank must have the same a.shape[0] and a.shape[2]

  • if axis != None, then it must be an int

Postconditions:
  • a and comm are not modified

Example 1:#

rank 0 2.2

3.3

rank 1 40.

51. -24. 45.

rank 2 -4.

res = la.mean(a, comm) then ALL ranks will contain res = 16.21

Example 2:#

rank 0 2.2 1.3 4.

3.3 5.0 33.

rank 1 40. -2. -4.

51. 4. 6. -24. 8. 9. 45. -3. -4.

rank 2 -4. 8. 9.

Suppose that we do:

res = la.mean(a, axis=0, comm)

then every rank will contain the same res which is:

res = ([16.21, 3.04, 7.57])

this is because the mean is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation.

Suppose that we do:

res = la.mean(a, axis=1, comm)

then res is now a rank-1 array as follows

rank 0 2.5

13.77

rank 1 11.33

20.33 -2.33 12.67

rank 2 4.33

because the axis queried for the mean is NOT a distributed axis so this operation is purely local and the result has the same distribution as the original array.

Example 3:#

/ 3. 4. / 2. 8. 2. 1. / 2.

/ 6. -1. / -2. -1. 0. -6. / 0. -> slice T(:,:,1)

/ -7. 5. / 5. 0. 3. 1. / 3.

|-----------|———————-|——– | 2. 3. | 4. 5. -2. 4. | -4. | 1. 5. | -2. 4. 8. -3. | 8. -> slice T(:,:,0) | 4. 3. | -4. 6. 9. -4. | 9.

r0 r1 r2

Suppose that we do:

res = la.mean(a, axis=0, comm)

then res is now a rank-2 array as follows:

/ 0.6667 2.6667 / 1.6667 2.3333 1.6667 -1.3333 / 1.6667

/ 2.3333 3.6667 / -0.6667. 5. 5. -1. / 4.3333

/ / /

/ r1 / r2 / r3

because the axis queried for the mean is NOT a distributed axis and this is effectively a reduction over the 0-th axis so this operation is purely local and the result has the same distribution as the original array.

Suppose that we do:

res = la.mean(a, axis=1, comm)

then this is effectively a reduction over axis=1, and every rank will contain the same res which is a rank-2 array as follows

1.71428571 3.1428571 3. -0.5714285 3.28571429 1.4285714

this is because the mean is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation and we know that memory-wise it is feasible to hold because this is no larger than the local allocation on each rank.

Suppose that we do:

res = la.mean(a, axis=2, comm)

then res is now a rank-2 array as follows

r0 || r1 || r2

|| ||

2.5 3.5 || 3. 6.5 0. 2.5 || -1. 3.5 2. || -2. 1.5 4. -4.5 || 4.

-1.5 4. || 0.5 3. 6. -1.5 || 6.

|| ||

because the axis queried for the mean is NOT a distributed axis and this is effectively a reduction over the 2-th axis so this operation is purely local and the result has the same distribution as the original array.

romtools.linalg.linalg.min(a, axis=None, comm=None)#

Return the minimum of a possibly distributed array or minimum along an axis.

Parameters:
  • a (np.ndarray) – input data

  • axis (None or int) – the axis along which to compute the minimum. If None, computes the min of the flattened array.

  • comm (MPI_Comm) – MPI communicator

Returns:

if axis == None, returns a scalar if axis is not None, returns an array of dimension a.ndim - 1

Preconditions:
  • a is at most a rank-3 tensor

  • if a is a distributed 2-D array, it must be distributed along axis=0, and every rank must have the same a.shape[1]

  • if a is a distributed 3-D tensor, it must be distributed along axis=1, and every rank must have the same a.shape[0] and a.shape[2]

  • if axis != None, then it must be an int

Postconditions:
  • a and comm are not modified

Example 1:#

rank 0 2.2

3.3

rank 1 40.

51. -24. 45.

rank 2 -4.

res = la.min(a, comm) then ALL ranks will contain res = -4.

Example 2:#

rank 0 2.2 1.3 4.

3.3 5.0 33.

rank 1 40. -2. -4.

51. 4. 6. -24. 8. 9. 45. -3. -4.

rank 2 -4. 8. 9.

Suppose that we do:

res = la.min(a, axis=0, comm)

then every rank will contain the same res which is an array = ([-24., -3., -4]) this is because the min is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation.

Suppose that we do:

res = la.min(a, axis=1, comm)

then res is now a rank-1 array as follows

rank 0 1.3

3.3

rank 1 -4.

4. -24. -4.

rank 2 -4.

because the axis queried for the min is NOT a distributed axis so this operation is purely local and the result has the same distribution as the original array.

Example 3:#

/ 3. 4. / 2. 8. 2. 1. / 2.

/ 6. -1. / -2. -1. 0. -6. / 0. -> slice T(:,:,1)

/ -7. 5. / 5. 0. 3. 1. / 3.

|-----------|———————-|——– | 2. 3. | 4. 5. -2. 4. | -4. | 1. 5. | -2. 4. 8. -3. | 8. -> slice T(:,:,0) | 4. 3. | -4. 6. 9. -4. | 9.

r0 r1 r2

Suppose that we do:

res = la.max(a, axis=0, comm)

then res is now a rank-2 array as follows:

/ -7. -1. / -2. -1. 0. -6. / 0.

/ 1. 3. / -4. 4. -2. -4. / -4.

/ / /

/ r1 / r2 / r3

because the axis queried for the max is NOT a distributed axis and this is effectively a reduction over the 0-th axis so this operation is purely local and the result has the same distribution as the original array.

Suppose that we do:

res = la.max(a, axis=1, comm)

then this is effectively a reduction over axis=1, and every rank will contain the same res which is a rank-2 array as follows

-4. 1. -3. -6. -4. -7.

this is because the max is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation and we know that memory-wise it is feasible to hold because this is no larger than the local allocation on each rank.

Suppose that we do:

res = la.max(a, axis=2, comm)

then res is now a rank-2 array as follows

r0 || r1 || r2

|| ||

2. 3. || 2. 5. -2. 1. || -4. 1. -1. || -2. -1. 0. -6. || 0.

-7. 3. || -4. 0. 3. -4. || 3.

|| ||

because the axis queried for the max is NOT a distributed axis and this is effectively a reduction over the 2-th axis so this operation is purely local and the result has the same distribution as the original array.

romtools.linalg.linalg.move_distributed_linear_system_to_rank_zero(A_in, b_in, comm)[source]#

Gathers a distributed linear system (A, b) from multiple MPI ranks to rank 0.

Preconditions:
  • A_in is a rank-2 tensor (2D array) representing a portion of the global matrix A.

  • b_in is a rank-1 or rank-2 tensor (1D or 2D array) representing a portion of the global vector b.

  • A_in and b_in are distributed row-wise across MPI ranks.

Returns:

The global matrix A assembled on rank 0. - b_g (numpy.ndarray): The global vector b assembled on rank 0.

Return type:

  • A_g (numpy.ndarray)

Parameters:
  • A_in (ndarray)

  • b_in (ndarray)

Postconditions:
  • On rank 0, A_g and b_g contain the fully assembled matrix and vector, respectively.

  • On other ranks, A_g and b_g are dummy arrays with no meaningful content.

  • The input arrays A_in and b_in are not modified.

Notes

  • The function ensures that all data is gathered without additional copies or unnecessary data movement.

  • Only rank 0 ends up with the complete system; other ranks have placeholder arrays.

romtools.linalg.linalg.pinv(A, comm=None)#

Computes the pseudoinverse of A and returns its transpose. Note that returning the transpose(A^+) is because of convenience. In fact, when A is row-distributed and comm is not None, then the result has the same distribution of A. If the matrix A is too large, this is the only feasble way to store the pseudoinverse since no single rank can fully store it.

Parameters:
  • A (-) – input matrix

  • comm (-) – MPI communicator (default: None)

Returns:

(A^+)^T = A (A^T A)^(-1)^T

Return type:

  • The transpose of A^+ computed as

Preconditions:
  • A must be a real, rank-2 matrix

  • A must have more rows than columns

  • A must have linearly independent columns

  • If A is distributed, it must be so along its rows

Post-conditions:
  • A and comm are not modified

  • A^+ A = I

romtools.linalg.linalg.product(flagA, flagB, alpha, A, B, beta, C, comm=None)#

Computes C = beta*C + alpha*op(A)*op(B), where A and B are row-distributed matrices.

Parameters:
  • flagA (str) – Determines the orientation of A, “T” for transpose or “N” for non-transpose.

  • flagB (str) – Determines the orientation of B, “T” for transpose or “N” for non-transpose.

  • alpha (float) – Coefficient of AB.

  • A (np.array) – 2-D matrix

  • B (np.array) – 2-D matrix

  • beta (float) – Coefficient of C.

  • C (np.array) – 2-D matrix to be filled with the product

  • comm (MPI_Comm) – MPI communicator (default: None)

Returns:

The specified product

Return type:

C (np.array)

romtools.linalg.linalg.std(a, dtype=None, axis=None, comm=None)#

Return the standard deviation of a possibly distributed array over a given axis.

Parameters:
  • a (np.ndarray) – input data

  • dtype (data-type) – Type to use in computing the mean

  • axis (None or int) – the axis along which to compute the mean. If None, computes the mean of the flattened array.

  • comm (MPI_Comm) – MPI communicator (default: None)

Returns:

if axis == None, returns a scalar if axis is not None, returns an array of dimension a.ndim - 1

Preconditions:
  • a is at most a rank-3 tensor

  • if a is a distributed 2-D array, it must be distributed along axis=0, and every rank must have the same a.shape[1]

  • if a is a distributed 3-D tensor, it must be distributed along axis=1, and every rank must have the same a.shape[0] and a.shape[2]

  • if axis != None, then it must be an int

Postconditions:
  • a and comm are not modified

Example 1:#

rank 0 2.2

3.3

rank 1 40.

51. -24. 45.

rank 2 -4.

res = la.std(a, comm) then ALL ranks will contain res = 26.71

Example 2:#

rank 0 2.2 1.3 4.

3.3 5.0 33.

rank 1 40. -2. -4.

51. 4. 6. -24. 8. 9. 45. -3. -4.

rank 2 -4. 8. 9.

Suppose that we do:

res = la.std(a, axis=0, comm)

then every rank will contain the same res which is:

res = ([26.71, 4.12 , 11.55])

this is because the standard deviation is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation.

Suppose that we do:

res = la.std(a, axis=1, comm)

then res is now a rank-1 array as follows

rank 0 1.12

13.62

rank 1 20.29

21.70 15.33 22.87

rank 2 5.91

because the axis queried for the standard deviation is NOT a distributed axis so this operation is purely local and the result has the same distribution as the original array.

Example 3:#

/ 3. 4. / 2. 8. 2. 1. / 2.

/ 6. -1. / -2. -1. 0. -6. / 0. -> slice T(:,:,1)

/ -7. 5. / 5. 0. 3. 1. / 3.

|-----------|———————-|——– | 2. 3. | 4. 5. -2. 4. | -4. | 1. 5. | -2. 4. 8. -3. | 8. -> slice T(:,:,0) | 4. 3. | -4. 6. 9. -4. | 9.

r0 r1 r2

Suppose that we do:

res = la.std(a, axis=0, comm)

then res is now a rank-2 array as follows:

/ 5.5578 2.6247 / 2.8674 4.0277 1.2472 3.2998 / 1.2472

/ 1.2472 0.9428 / 3.3993 0.8165 4.9666 3.5590 / 5.9067

/ / /

/ r1 / r2 / r3

because the axis queried for the standard deviation is NOT a distributed axis and this is effectively a reduction over the 0-th axis so this operation is purely local and the result has the same distribution as the original array.

Suppose that we do:

res = la.std(a, axis=1, comm)

then this is effectively a reduction over axis=1, and every rank will contain the same res which is a rank-2 array as follows

3.14934396 2.16653584 4.14039336 3.28881841 5.06287004 3.84919817

this is because the standard deviation is queried for the 0-th axis which is the axis along which the data array is distributed. So this operation must be a collective operation and we know that memory-wise it is feasible to hold because this is no larger than the local allocation on each rank.

Suppose that we do:

res = la.std(a, axis=2, comm)

then res is now a rank-2 array as follows

r0 || r1 || r2

|| ||

0.5 0.5 || 1. 1.5 2. 1.5 || 3. 2.5 3. || 0. 2.5 4. 1.5 || 4. 5.5 1. || 4.5 3. 3. 2.5 || 3.

|| ||

because the axis queried for the standard deviation is NOT a distributed axis and this is effectively a reduction over the 2-th axis so this operation is purely local and the result has the same distribution as the original array.

romtools.linalg.linalg.thin_svd(M, comm=None, method='auto')#
Preconditions:
  • M is rank-2 tensor

  • if M is distributed, M is distributed over its 0-th axis (row distribution)

  • allowed choices for method are “auto”, “method_of_snapshots”

Returns:

  • left singular vectors and singular values

Postconditions:
  • M is not modified

  • if M is distributed, the left singular vectors have the same distributions