P. Tsalmantza, David W. Hogg
We present a data-driven method - heteroscedastic matrix factorization, a
kind of probabilistic factor analysis - for modeling or performing
dimensionality reduction on observed spectra or other high-dimensional data
with known but non-uniform observational uncertainties. The method uses an
iterative inverse-variance-weighted least-squares minimization procedure to
generate a best set of basis functions. The method is similar to principal
components analysis, but with the substantial advantage that it uses
measurement uncertainties in a responsible way and accounts naturally for
poorly measured and missing data; it models the variance in the
noise-deconvolved data space. A regularization can be applied, in the form of a
smoothness prior (inspired by Gaussian processes) or a non-negative constraint,
without making the method prohibitively slow. Because the method optimizes a
justified scalar (related to the likelihood), the basis provides a better fit
to the data in a probabilistic sense than any PCA basis. We test the method on
SDSS spectra, concentrating on spectra known to contain two redshift
components: These are spectra of gravitational lens candidates and massive
black-hole binaries. We apply a hypothesis test to compare one-redshift and
two-redshift models for these spectra, utilizing the data-driven model trained
on a random subset of all SDSS spectra. This test confirms 129 of the 131 lens
candidates in our sample and all of the known binary candidates, and turns up
very few false positives.
View original:
http://arxiv.org/abs/1201.3370
No comments:
Post a Comment