Iterative Quantization: A Procrustean Approach to Learning Binary Codes
Yunchao Gong and S. Lazebnik. CVPR 2011
learning similarity-preserving binary codes for efficient retival in large-scale image collections
1. An effective scheme for learning binary codes should have properties as following,
- The codes should be short. (For storing in memory effectively)
- The codes should map the similarity of images.
- The algorithms should be efficient for learning the parameters of the binary code and ecoding a new test image.
- To learn good binary codes without any supervisory information in the form of class labels.
3. Problem formulation
- To directly minimize the quantization error of mapping this data to vertices of binary hypercube.
Contribution and Discussion
- To show that rotate the projected data could improve the performance of PCA-based binary coding schemes.
- To demonstrate an iterative quantization method for refining this rotation.
- ITQ (Iterative Quantization) uses one bit per projected data dimension.
Step 1. Dimensionality Reduction
To yields PCA projections, this step follows the maximum variance formulation from the two papers, Spectral hashing (Y. Weiss et al. In NIPS 2008) and Semi-supervised hashing for large-scale image retrieval (J. Wang et al. In CVPR 2010).
Here, they want to produce an efficient code in which*the variance of each bit is maximized* and the bits are pairwise uncorrelated. The variance is maximized by encoding functions that produce exactly balanced bits. This could be get through the following continuous objective function:
Step 2. Binary Quantization
To preaerve the locality structure of the projected data by rotating it so as to minimize the discretization error. The ideal is illustrated in the Figure 1.
First, the quantization loss is defined in Equation 2:
This step is beginning with the random initialization of R, then adopt a k-means-like iterative quantization (ITQ) procedure to find a local minimum of the quantization loss.
In each iteration:
- Fix R and update B: Each dat point is assigned to the nearest vertex of the binary hypercube
- Fix B and update R: R is updated to minimize the quantization loss
- Alternate between steps 1 and 2 to find a locally optimal solution.
Here is one sample to use ITQ to encode the images and than retrieve images using 32 bits: