Due: 10/16/15 at 11:59PM Eastern Time
Objective: This assignment will guide you through the most important step of the face processing pipeline we have been studying in class: feature matching. Using face pair matching as our recognition scenario, we’ll explore two different approaches to feature extraction and matching. The first approach is representative of hand-tuned feature representations, with an LBP-like descriptor and distance measure between feature vectors. The second approach is a more recent deep learning method, which uses a three-layer convolutional neural network to generate features, and a fast linear SVM solver to train models for matching. Through the exercises below, you will gain experience with the OpenBR and LIBLINEAR libraries, and a neural network that is being studied by computational neuroscientists for its invariance and selectivity properties. You will also learn about the popular Labeled Faces in the Wild data set.
Grading: You will be graded based on the code you develop, plus your answers to the following questions. Similar to the last assignment, you will not be graded on face pair matching performance, but rather on the analysis of the data produced. This assignment is worth 175 points.
Task 2: Download the Labeled Faces in the Wild Set (and the other sets provided by OpenBR) via the openbr/scripts/downloadDatasets.sh script. LFW is a pair matching challenge, where given a pair of faces, the task is to determine whether or not they match. For this assignment, we'll work with the pairsDevTrain.txt and pairsDevTest.txt partitions. These files indicate the name of the person and the corresponding image ID, which will allow you to build valid file paths to the actual images. More information on the LFW data set can be found on the LFW website.
Task 3: Get a python environment running for the convolutional neural network code. The code will run "out-of-the-box" and quickly under the python Anaconda environment with the Intel MKL add-on (This add-on is free for 30 days). Install the Python 2.7 version and then run "conda update conda" from the command line. Then install the opencv module by running "conda install opencv" from the command line. Intel MKL provides enhanced vector processing capabilities for general purpose CPUs. Install it by running "conda install mkl" from the command line. Note: you do not need MKL to complete the assignment; if you run into dependency problems (reported for certain recent builds), you can move forward without it. Test slmsimple.py, which generates the biologically-inspired features described in the FG 2011 paper by D. Cox and N. Pinto. By default, this program takes as input a list of filenames pointing to images and a class label, and outputs a feature vector corresponding to each image in the list. You will modify this code in Task 6. Note that the dimensionality of the feature vectors is very high; be prepared to generate a lot of data (a couple of gigabytes) for the LFW partitions.
Task 4: Download and compile LIBLINEAR, a fast solver for linear support vector machines. LIBLINEAR will allow you to quickly train models from the high dimensional vectors output by slmsimple.py. To generate performance curves, you will need scores for each matching instance. The standalone binaries included in the LIBLINEAR package only output class labels. You can gain access to the scores by using LIBLINEAR through its python interface. Build the python interface, and take a look at the example usage in the README file found in the python build directory.
Task 5: OpenBR has pre-trained models, and performs alignment on the fly for images from LFW. Use the "br" program in its pair matching mode, which will invoke the 4SF face recognition algorithm, a hybrid LBP + subspace method. Read through the description of this algorithm in its published paper. The OpenBR implementation will produce a match score, where a higher score indicates a stronger match, for each pair of faces you provide. Perform face matching over all of the face pairs found in pairsDevTest.txt. Save the scores, and generate a DET curve for all of the face pair instances using a plotting tool of your choice. To generate the curve, you will need to sweep a series of thresholds over the scores and calculate the error statistics (refer back to the notes from Unit 1).
Question 1: What is the equal error rate of the 4SF algorithm on this test?
Question 2: What sort of non-matching instances is the 4SF algorithm getting wrong? Provide some examples. Based on what you've read about this algorithm (and what you’ve learned about hand-tuned feature methods in class), why do you think this is the case?
Question 3: What sort of matching instances is the 4SF algorithm getting wrong? Provide some examples. Why do you think this is the case?
Task 6: slmsimple.py is just a feature generator, and does not do any alignment. We'll skip that step by using the LFW-A set, which is pre-aligned, for the convolutional neural network experiment. Download and extract this data set.
Modify the code of slmsimple.py to process pairs of faces. Using your modified implementation, generate feature vectors for all of the pairs in pairsDevTrain.txt. Train a model from the generated features. Test this model on feature vectors you generate from the pairs in pairsDevTest.txt. Due to the aforementioned score access problem, do all of this in python. Save the scores, and generate a DET curve for all of the face pair instances.
Tip #2: You need to assess distance between feature vectors in each pair (make sure you generate a separate vector for each face in the pair) before you have a vector that can be used by SVM for training or testing. Several strategies for doing this are listed in the paper by Cox and Pinto. Use one of them. Give positive vectors a label of +1, and negative vectors a label of -1.
Question 4: What is the equal error rate of the model trained over features from the three-layer convolutional neural network on this task?
Question 5: What sort of non-matching instances is your model getting wrong? Provide some examples. Based on what you've read about this algorithm (and what you've learned about deep learning methods in class), why do you think this is the case?
Question 6: What sort of matching instances is your model getting wrong? Provide some examples. Why do you think this is the case?
Question 7: On this data set, which algorithm turned out to be better?
Question 8: If you could design a better face recognition algorithm, what problems would you focus on?
Deliverables. You must turn in the following deliverables to receive full credit for this assignment: (1) All source code you developed (scripts to run experiments and your modified version of slmsimple.py); and (2) a report including the DET curves from Tasks 5 and 6, and answers to Questions 1-8.
Have questions about this assignment? Ask them! If globally applicable, your question and its answer will be posted to the course website for others to see.
Tip #3: Start early! If you run into trouble with your development early on, having ample time for debugging will help.
Q: When building openbr, the file models.tar.gz doesn't download at the beginning of the process.
A: This problem has been noted on OS X. The URL specified in the build routine is incorrect. The tarball can be downloaded from the v1.1.0 section of the openbr releases page.
Q: OpenBR's downloadDatasets.sh script wants to download a lot of data. Is there a way to only download LFW?
A: Yes. You can either comment out or simply delete all of the code in the script for datasets other than LFW.
Q: What should the class label argument to slmsimple.py be?
A: For a face pair matching problem, you can assign matching face pairs the label +1 and non-matching face pairs the label -1.
Q: Does slmsimple.py parse pairsDevTest.txt by default?
A: No, you must make modifications to that file and/or the slmsimple.py program to parse it correctly.
Q: OpenBR has a lot of face recognition features; how can I do basic matching?
A: The only syntax you need for this assignment is: br -algorithm FaceRecognition -compare me.jpg you.jpg
Q: Are we using slmsimple.py to generate features for OpenBR?
A: No. You are comparing OpenBR against the features generated by slmsimple.py and subsequently classified by LIBLINEAR. Thus, you will be generating two DET curves - one for each algorithm.
Q: Can I submit my assignment after the deadline?
A: For this first assignment, we will consider late submissions. However, there will be a penalty of 10 points per day you are late, which will be automatically subtracted from your total score.