Custom memory paging algorithm for Singular Value Decomposition of large datasets.
Singular Value Decomposition (SVD) is an important Factorisation within the sphere of linear algebra, SVD on large datasets is a problem that is very relevant to many scientific applications such as numerical weather prediction, astronomical data modeling, financial modeling and many industrial image processing applications.
This blog post demonstrates a technique which yields low physical memory usage by storing ‘intermediate matrices’ produced during the SVD process on a persistent store (i.e a hard disk). I first wrote this custom paging algorithm so that I could perform SVD on very large data sets, and not use more physical memory than my 32bit machine can support.
Virtual Memory and Memory Addressing.
This article assumes that the matrix you want to perform SVD on is greater than 4 gigabytes and less than 17,179,869,184 gigabytes.
Most CPUs are currently designed so that the contents of a single integer register can store the address of any datum in the computer’s virtual memory. Therefore, the total number of addresses in the virtual memory, is determined by the width of these registers. 32-bit memory addressing means that 232 addresses, or 4 gigabytes of RAM, can be referenced.
The emergence of the 64-bit architecture effectively increases the memory ceiling to 264 addresses, equivalent to 17,179,869,184 gigabytes of RAM however, even on a 64bit machine with disk paging enabled, the technique described here are still effective due to the fact that traditional paging algorithms are not being optimized for handing matrices.
Overview of Singular Value Decomposition
By definition Singular Value Decomposition is the decomposition of a matrix M into orthogonal matrices, U and V and a diagonal matrix S:
Standard algorithms (such as LAPACKs svd subroutine) used to calculate the SVD of a matrix require the whole matrix to be in memory at any one time. My custom method still adheres to this constraint due to the incremental methodology used.
In 2006 Matthew Brand created a new method which calculates the SVD of a matrix in an incremental fashion, thus allowing the addition of new data to a dataset without the need to recalculate the entire SVD of the whole dataset:
Incremental methodology allows us to calculate the SVD of a dataset using far less processing and memory requirements than that of batch SVD(i.e calculating the SVD of a matrix all at once). In practice we have to feed parts of the dataset into the incremental algorithm bit by bit. In an image processing system one might consider feeding each image vector in a sequence of images in at a time.
Brands’ method still produces large matrices however I have engineered it in such a way as to store these large matrices on disk in ‘swapfiles’ whilst keeping smaller ‘intermediate matrices’ in memory; this will all become much clearer once the methodology is explained.
Step 1: Calculation of intermediate matrices
The first stage of the algorithm is the calculation of intermediate matrices. Matrix L is the
projection of C onto the orthogonal basis U, or C in terms of U. Matrix H is the component of
C which is orthogonal to the subspace spanned by U. This is the component of C which is
not represented by U. Basically what brand suggests is that we create the intermediate matrix L by doing the transpose of U(created from the SVD of M) multiplied by our new data C:
And we create matrix H by doing C minus U multiplied by L:
Step 2: QR decomposition
Matrix H is decomposed into an orthogonal basis, J, and upper triangular matrix K. For this
purpose Brand suggests QR Decomposition. He suggests the use of the Modified Gram-
Schmidt algorithm in order to keep numerical stability
Step 3: Matrix Concatenation
Now that we have produced all of these intermediate matrices Brand concatenates them into the following framework:
Note that matrix I is an identity matrix.
Step 4: SVD on matrix Q.
Brand suggests that we treat the matrix in the center (the one which is the concatenation of C, L and K as the matrix Q:
Now for the beauty of this piece of mathematics; we only compute the SVD of this small matrix Q:
Step 5: Substitution of Q back into the concatenated framework
We now substitute the results of Q back into the mix:
Step 6: Calculation of Final Matrices by breaking up the concatenated framework into three parts.
Too calculate the U’’, S’’ and V’’ we split the above equation into three:
Getting the Super Massive requirement:
To enable this to work on super massive datasets I wrote an implementation that stores matrices M and U on the hard disk as opposed to in memory, this allows me to use Fortran’s SVD routine as matrix Q is small and is kept on the hard disk.
Obviously storing a matrix in a swap file on disk is far slower than storing it in memory, and because of this the program does run slower. Also because my implementation is not (yet) optimised for speed it doesn’t run as fast compared to an implementation using other matrix libraries (such as Newmat or Blitz++). In terms of the actual algorithm Brand states that for the incremental method, for a dense p x q matrix with a low rank r, the incremental method has time complexity O(pqr) and space complexity O((p + q)r).
One of the main problems that I encountered whilst implementing this is that Matrix U grows as it is
concatenated with other matrices and I am forced to redeclare a new size of U in the hard disk each time the SVD is updated, this operation is an I/O bound process and hence it does take some time.
There will be an implementation of the above algorithm to follow in another blog post, probably in f#.