The technique of random projection is one of dimension reduction, where high dimensional vectors in R~D are projected down to a smaller subspace in R~k. Certain forms of distances or distance kernels such as Euclidean distances, inner products [10], and l_p distances [12] between high dimensional vectors are approximately preserved in this smaller dimensional subspace. Word vectors which are represented in a bag of words model can thus be projected down to a smaller subspace via random projections, and their relative similarity computed via distance metrics. We propose using marginal information and Bayesian probability to improve the estimates of the inner product between pairs of vectors, and demonstrate our results on actual datasets.
展开▼