Contribution and Disscussion
Propose an iterative online algorithm which could slove the dictionary learning problem by efficienty minimizing at eacch step a quadratic surrogate function of the empirical cost over the set of constraints.
This algorithm is faster than previous approaches to dictionary learning on both small and large datasets of natural images.
Background Knowledge -- Sparse Coding
1. Change of basis
In linear algebra, we may change the coordinates relative to a basis to another, in order to make the problem we are going to solve more easily.
Using different coordinates is like we look at the problem in a different view. This may helps us to find some hidden unformation and structure lies behind.
2. Over-complete set of basis vectors
After removing one basis from the set, the basis set is still complete (every element in input set X can be approximated arbitrarily well in norm by finite linear combinations of basis in basis set).
3. Sparse coding (sparse dictionary learning)
Sparse dictionary learning is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set.
This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed. The above two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal but also provide an improvement in sparsity and flexibility of the representation.
(Copy from Wiki)
a. [Adv] Could capture structures and patterns inherent in the input data
b. [Con] With an over-complete basis, the coefficients ai are no longer uniquely determined by the input vector
c. The sparse coding cost function on a set of m input vectors was defined as:
where is a sparsity cost function which penalizes for being far from zero.
Classical dictionary learning problem:
could bw defined as the optimal value of the problem:
Under the constraint ( C is the convex set of matrices), to prevent D from being arbitrarily large (which would lead to arbitrarily small values of α):
Algorithm Outline of Online Dictionary Learning