# Singular Value Decomposition

The singular value decomposition (SVD) is of increasing importance in signal processing. It is an advanced linear algebra operation that produces a basis for the row and column space of the matrix and an indication of the rank of the matrix. In adaptive signal processing, the matrix rank and the basis are useful for reducing the effects of interference. Given an m × n complex matrix A, the singular value decomposition of A is^{H}

where U is a unitary matrix of size m × m, ∑ is an m × n matrix in which the upper n × n portion is a diagonal matrix with all entries real and sorted in descending order, and V is an n × n unitary matrix. If m > n, then define

_{a}is size m × n, U

_{b}is size m × (m - n), and Σ

_{a}is of size n × n. Then A = U

_{a}Σ

_{a}V

^{H}is called the

*reduced SVD*of the matrix A. In this context the SVD defined in Equation (1) is sometimes referred to as the

*full SVD*for contrast. Notice that U

_{a}is not unitary, but it does have orthogonal columns. When m < n, the reduced SVD can be similarly defined by partitioning V instead of U.

For signal processing applications, we are typically most interested in the reduced decomposition, in the matrix U, and in the singular values (the values on the diagonal of Σ). We provide operation counts for the reduced decomposition, assuming that all three matrices are produced. The data matrix sizes of interest and associated operation counts are given in Table 1.

Parameter Names |
Description | Values | ||

Set 1 | Set 2 | Set 3 | ||

m | Matrix rows | 500 | 180 | 150 |

n | Matrix columns | 100 | 60 | 150 |

W_{r1} |
Fixed workload (Mflop) | 101 | 15 | 72 |

W_{r2} |
Workload per iteration (Mflop) | 0.24 | 0.88 | 0.54 |

There are three major steps to the full SVD algorithm, which are described in more detail in Golub and Van Loan [2, Algorithm 5.2.5]. First, if the m × n matrix A has many more rows than columns, a QR factorization is performed. This step is done if m > 5n/3 [2, p. 252], which is typically the case in signal processing. Define

_{a}is size m × n, Q

_{b}is size m × (m - n), and R

_{a}is size n × n. The decomposition A = Q

_{a}R

_{a}is referred to as the

*reduced QR*decomposition of A. Matrix Q

_{a}is not unitary, but it has orthogonal columns. The reduced QR factorization can be obtained by the modified Gram-Schmidt algorithm described in Golub and Van Loan [2, Algorithm 5.2.5]. If a full SVD is being performed, the full QR is computed: if a reduced SVD is being performed, a reduced QR is computed. In the remainder of this exposition, we describe the reduced SVD algorithm for a matrix with m ≥ n. We assume that in the first step, we perform a reduced QR decomposition via the MGS algorithm to produce

_{1}R,

where R is an n × n upper-triangular matrix and U_{1} is an m × n matrix with orthogonal columns. (Notice that the QR factorization
described in the QR Factorization page is a full QR; hence, a
different algorithm is used here.)

In the next step, R is reduced to *bi-diagonal* form, to
consist of the main diagonal and a single diagonal of entries above
that, with the remainder of the entries in the matrix being
zero [2, p. 253]. This is accomplished with Householder
transforms, producing

_{2}B V

_{2}

^{H},

_{2}and V

_{2}are unitary and of size n × n, and the n × n matrix B is bi-diagonal. The matrix B produced at this step is no longer complex, but real, though matrices U

_{2}and V

_{2}are complex. The final step is an iterative reduction of B to diagonal form and the ordering of the singular values. This is accomplished with Givens rotations [2, p. 454]. At the end of this step we have produced matrices n × n orthogonal matrices U

_{3}and V

_{3}such that

_{3}ΣV

_{3}

^{H}

_{1}U

_{2}U

_{3}Σ(V

_{3}V

_{2})

^{H}

= UΣV

^{H}

_{1}U

_{2}U

_{3}and V = V

_{3}V

_{2}.

As the exact number of iterations required to produce the SVD is based
on the data, an efficiency measurement must take into account the
actual number of iteration steps performed. We account for this by
defining the workload W as a linear function of two numbers W_{1}
and W_{2} given in Table 1,

_{1}+ d*W

_{2},

where d is the number of iteration steps performed in the reduction
of B to diagonal form, W_{2} is an estimate of the number of
floating-point operations per iteration step, and W_{1} is an
estimate of the number of floating-point operations in the remainder
of the algorithm. The estimates W_{1} and W_{2} for complex matrices
are given separately for the full and reduced SVD algorithms in
Table 1. The number d for a given data set must be
"discovered" in the course of the execution of the benchmark.

There are many implementation details associated with achieving high performance in the singular value decomposition. Examples of such details include the use of block Householder transforms [2, p. 213] and the storage of the Householder transforms and Givens rotations that produce U and V rather than the matrices themselves [1].

## References

1. J. J. M. Cuppen. The singular value decomposition in product form. SIAM J. Sci. Stat. Comput., 4(2):216–222, June 1983.2. Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins University Press, 3rd edition, 1996.