# Singular Value Decomposition

The singular value decomposition (SVD) is of increasing importance in signal processing. It is an advanced linear algebra operation that produces a basis for the row and column space of the matrix and an indication of the rank of the matrix. In adaptive signal processing, the matrix rank and the basis are useful for reducing the effects of interference. Given an m × n complex matrix A, the singular value decomposition of A is
A = UΣVH
(1)

where U is a unitary matrix of size m × m, ∑ is an m × n matrix in which the upper n × n portion is a diagonal matrix with all entries real and sorted in descending order, and V is an n × n unitary matrix. If m > n, then define where Ua is size m × n, Ub is size m × (m - n), and Σa is of size n × n. Then A = UaΣaVH is called the reduced SVD of the matrix A. In this context the SVD defined in Equation (1) is sometimes referred to as the full SVD for contrast. Notice that Ua is not unitary, but it does have orthogonal columns. When m < n, the reduced SVD can be similarly defined by partitioning V instead of U.

For signal processing applications, we are typically most interested in the reduced decomposition, in the matrix U, and in the singular values (the values on the diagonal of Σ). We provide operation counts for the reduced decomposition, assuming that all three matrices are produced. The data matrix sizes of interest and associated operation counts are given in Table 1.

Table 1: SVD input parameters.
 Parameter Names Description Values Set 1 Set 2 Set 3 m Matrix rows 500 180 150 n Matrix columns 100 60 150 Wr1 Fixed workload (Mflop) 101 15 72 Wr2 Workload per iteration (Mflop) 0.24 0.88 0.54

There are three major steps to the full SVD algorithm, which are described in more detail in Golub and Van Loan [2, Algorithm 5.2.5]. First, if the m × n matrix A has many more rows than columns, a QR factorization is performed. This step is done if m > 5n/3 [2, p. 252], which is typically the case in signal processing. Define

where Qa is size m × n, Qb is size m × (m - n), and Ra is size n × n. The decomposition A = Qa Ra is referred to as the reduced QR decomposition of A. Matrix Qa is not unitary, but it has orthogonal columns. The reduced QR factorization can be obtained by the modified Gram-Schmidt algorithm described in Golub and Van Loan [2, Algorithm 5.2.5]. If a full SVD is being performed, the full QR is computed: if a reduced SVD is being performed, a reduced QR is computed. In the remainder of this exposition, we describe the reduced SVD algorithm for a matrix with m ≥ n. We assume that in the first step, we perform a reduced QR decomposition via the MGS algorithm to produce
A = U1 R,

where R is an n × n upper-triangular matrix and U1 is an m × n matrix with orthogonal columns. (Notice that the QR factorization described in the QR Factorization page is a full QR; hence, a different algorithm is used here.)

In the next step, R is reduced to bi-diagonal form, to consist of the main diagonal and a single diagonal of entries above that, with the remainder of the entries in the matrix being zero [2, p. 253]. This is accomplished with Householder transforms, producing

R = U2 B V2H,
where U2 and V2 are unitary and of size n × n, and the n × n matrix B is bi-diagonal. The matrix B produced at this step is no longer complex, but real, though matrices U2 and V2 are complex.

The final step is an iterative reduction of B to diagonal form and the ordering of the singular values. This is accomplished with Givens rotations [2, p. 454]. At the end of this step we have produced matrices n × n orthogonal matrices U3 and V3 such that
B = U3 ΣV3H
so that the singular value decomposition of the original matrix A can be expressed as
A = U1U2U3 Σ(V3 V2)H
= UΣVH
with U = U1 U2 U3 and V = V3 V2.

As the exact number of iterations required to produce the SVD is based on the data, an efficiency measurement must take into account the actual number of iteration steps performed. We account for this by defining the workload W as a linear function of two numbers W1 and W2 given in Table 1,

W = W1 + d*W2,

where d is the number of iteration steps performed in the reduction of B to diagonal form, W2 is an estimate of the number of floating-point operations per iteration step, and W1 is an estimate of the number of floating-point operations in the remainder of the algorithm. The estimates W1 and W2 for complex matrices are given separately for the full and reduced SVD algorithms in Table 1. The number d for a given data set must be "discovered" in the course of the execution of the benchmark.

There are many implementation details associated with achieving high performance in the singular value decomposition. Examples of such details include the use of block Householder transforms [2, p. 213] and the storage of the Householder transforms and Givens rotations that produce U and V rather than the matrices themselves [1].

## References

1. J. J. M. Cuppen. The singular value decomposition in product form. SIAM J. Sci. Stat. Comput., 4(2):216–222, June 1983.
2. Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins University Press, 3rd edition, 1996.