Fall 2012 - Present
For many classes of problems, the goal of computer vision is to solve visual challenges for which human observers have effortless expertise - face and object recognition, image segmentation, and medical image analysis, to name just a few. However, there exists a large class of problems where human performance dramatically outshines current efforts. This occurs even in areas where computer vision has been considered to be highly successful, such as the case of face detection. For example, digital cameras identify faces quickly and accurately, yet when compared to human ability to detect faces in challenging views and environments, no extant algorithm comes close to matching human performance.
There is an obvious gap between current state-of-the-art computer vision applications and human performance. While current methods are improving year by year, there is the concern that such methods will asymptote well below the level of human performance. In this work, we provide a new approach that relies on a heretofore untapped source of information, one that significantly improves performance at a rate beyond current methods. In addition, we argue that this method can be of considerable assistance even for emerging solutions that are not well-studied, as it supplies fundamental information likely to be useful for all algorithms.
We find that any reference to human performance is often non-existent or impoverished. If there is any reference, it is simply to compare overall performance, say measuring human accuracy and comparing it with that of the machine for an extended task with many items. There is much more information about human capacities that is of direct value. For example, some images are learnable and some are not. This learnability also varies with experience. Something that is initially not learnable can be learnable at a later training session. And learnability itself can be further fractionated. Some things are easily and quickly learned; some take more time. Such detailed information reflecting human capacity, which we call a perceptual annotation, is something that can be effectively used in conjunction with current algorithms. The key approach to accomplish this is to use the results obtained from the discipline of human psychophysics.
This work was supported by NIH Grant R01 EY01363, NSF IIS Award #0963668, and a gift from the Intel Corporation
- "Perceptual Annotation: Measuring Human Vision to Improve Computer Vision,", , ,IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI),August 2014.
- The Perceptual Annotation code is available on GitHub