Modeling the upper tail of the distribution of facial recognition non-match scores
Date
2016
Authors
Hunter, Brett D., author
Cooley, Dan, advisor
Givens, Geof, advisor
Kokoszka, Piotr, committee member
Fosdick, Bailey, committee member
Adams, Henry, committee member
Journal Title
Journal ISSN
Volume Title
Abstract
In facial recognition applications, the upper tail of the distribution of non-match scores is of interest because existing algorithms classify a pair of images as a match if their score exceeds some high quantile of the non-match distribution. I construct a general model for the distribution above the (1-τ)th quantile borrowing ideas from extreme value theory. The resulting distribution can be viewed as a reparameterized generalized Pareto distribution (GPD), but it differs from the traditional GPD in that τ is treated as fixed. Inference for both the (1-τ)th quantile uτ and the GPD scale and shape parameters is performed via M-estimation, where my objective function is a combination of the quantile regression loss function and reparameterized GPD densities. By parameterizing uτ and the GPD parameters in terms of available covariates, understanding of these covariates' influence on the tail of the distribution of non-match scores is attained. A simulation study shows that my method is able to estimate both the set of parameters describing the covariates' influence and high quantiles of the non-match distribution. The simulation study also shows that my model is competitive with quantile regression in estimating high quantiles and that it outperforms quantile regression for extremely high quantiles. I apply my method to a data set of non-match scores and find that covariates such as gender, use of glasses, and age difference have a strong influence on the tail of the non-match distribution.