The ‘Mystery’ of Singular Value Decomposition
Apperceptual comments on an interesting problem in one of his blog posts [not online anymore]. He is discussing the importance of high order co-occurrences on word similarity measures in LSA. The part that interested me was the discussion of Singular Value Decomposition (SVD). My interpretation has always been that SVD’s most useful characteristic was to amplify the information content and reduces noise. Certainly an interesting question that comes to mind is, how to measure such an improvement. A dimensional reduction (or for that matter any noise reduction) is only useful when applied appropriately or it falls short of its ability or worse reduces the (useful) information content. To test this run a semantic vector space with increasingly harsh dimensional reduction on the vector space. The vectors start focusing, then clumping until the reduction is too high and they collapse on a handful of dimensions.
The two other points he makes are latent meaning being embedded in the columns of the matrix as well as high order co-occurences. Latter appears to be disputed by Landauer and as explained by Apperceptual appears to have little influence. I am not certain how much difference there is between latent meaning and high order co-occurrences. It might very well be that these two are closely linked if not the same. If one thinks about Hinrich Schütze’s Automatic word sense discrimination it seems to make a similar point with context and second order co-occurences. Assuming the context of a word is similar to the mentioned columns and the contained latent meanings then one could argue that they are nothing more than high order co-occurrences. To be honest it is not a completely accurate comparison as Schütze bases his work on a HAL matrix and not LSA.