Due: 12/7/15 at 11:59PM Eastern Time

Objective: In the literature, fingerprint spoof detection is frequently described as an active process at the sensor-level. This is true in many cases, but we can accomplish the same goal without the need for specialized hardware, and with relatively high accuracy (depending on how much we know about the materials used) by using the images from the sensor. In this homework, you will learn how to use supervised machine learning for the purpose of detecting spoofed fingerprint images. After completing this assignment, you will be ready to train new models for spoof detection as novel materials are discovered and deployed by attackers.

Grading: You will be graded based on the code you develop, plus your answers to the following questions. Similar to previous assignments, you will not be graded on the absolute performance of your spoof detection models, but rather on the analysis of their capabilities. This assignment is worth 175 points.


Task 1: Download and study the data set

  1. Download and unarchive the pre-processed data set for this assignment. It is a subset of a larger data set from this paper by a team of researchers at Notre Dame and Michigan State University, which examines the potential of five different image-based features for spoof detection. One of these features is Local Binary Patterns (LBP), and we will consider it here. With respect to the organization of the data set you have just downloaded, it contains files for both training (prefixed with Train_*) and testing (prefixed with Test_*) that are collections of LBP feature vectors in LIBSVM format that have been scaled so that each dimension has a value in [-1,1]. The original images that these features were derived from were acquired by a Biometrika optical sensor, and included in the LivDet 2011 challenge data.

  2. Feature vectors are available for five different fabrication materials, as well as live skin. For known class training data, the files are partitioned into 1,000 live and 400 spoof vectors corresponding to two fabrication materials (noted in the filename). This presents a total of 10 different spoof material combinations for training. Two sets of testing files can be found in the data set. Testing files tagged with "known" contain 1,000 feature vectors from live skin images, and 400 feature vectors from two known fabrication materials (noted in the filename). Testing files tagged with "novel" contain 1,000 feature vectors from live skin images, and 600 feature vectors from materials other than the two seen in the training file for a specific combination.

Question 1: We first encountered LBP when learning about face recognition in Unit 2. Why might this feature also be useful for spoof detection?

Task 2: Download and compile the SVM code

  1. Download and compile LIBSVM, a sequential minimal optimization (SMO) solver for support vector machines. LIBSVM will allow you to train spoof detection models from the vectors contained in the training files from the LBP features data set.

  2. Also build the python interface, and take a look at the example usage in the README file found in the python build directory.

Task 3: Develop software for training spoof detection models

  1. Write a python program to train spoof detection models and evaluate them. Your program should take as input a training file, SVM hyperparameters (described on the LIBSVM website), as well as a testing file.

  2. During testing, your program should be able to generate performance curve data (false spoof detection rate and false live finger detection rate) by sweeping a series of thresholds over the scores for each test vector, and isolate the EER automatically from the curve.

  3. Tip #1: If you don't have a point that is an exact EER, return the point closest to a hypothetical EER as an approximation (i.e., the point with the smallest difference between the false spoof detection rate and false live finger detection rate).

Task 4: Spoof detection model evaluation for known materials

  1. For each combination of materials in the data set, train a model and test it using the corresponding "known" testing file.

  2. Tune your hyperparameters (you are free to use linear or kernel SVM) on this data to achieve as low an error rate as you can.

  3. Save your hyperparameter combinations for later use in Task 5.

  4. Tip #2: You might find it helpful to automate the tuning of SVM hyperparameters C and gamma directly within your program.

Question 2: How well does your spoof detection framework work when it is trained and tuned for a known combination of materials? Report the EER for each combination of materials in a table (10 in total). Also report the mean EER and standard error for all combinations.

Question 3: Which combination of materials was your spoof detector most effective at identifying? Why do you think this was the case?

Question 4: Which combination of materials was your spoof detector least effective at identifying? What held your detection approach back in this case?

Task 5: Spoof detection model evaluation for novel materials

  1. For each combination of materials in the data set, train a model using the hyperparameters you discovered in Question #2 and test it using the corresponding "novel" testing file.

Question 5: How well does your spoof detection framework work when it is trained and tuned for a known combination of materials, but tested on new materials? Again report the EER for each combination of materials in a table (10 in total). Also report the mean EER and standard error for all combinations.

Question 6: What happened to the performance of your spoof detection framework when it was trained on one set of fabrication materials and tested on another?

Question 7: Based on the results in your tables, if you were an attacker, which material would you choose to spoof a fingerprint authentication system?

Question 8: We only considered five different fabrication materials in this assignment. What other possibilities exist? Name a few others that could be useful for creating spoofs.

Question 9: Besides LBP, what other features that we have discussed in class might be effective for the problem of fingerprint spoof detection?


Deliverables. Submit the following files to receive full credit for this assignment:

  1. All source code you developed (python code for your spoof detection framework and any additional code you developed to run the experiments).
  2. A PDF report including:
    • The tables from Questions 2 and 5.
    • Complete answers to Questions 1-9.

Have questions about this assignment? Ask them! If globally applicable, your question and its answer will be posted to the course website for others to see.

Tip #3: Start early! If you run into trouble with your development early on, having ample time for debugging will help.


Q&A

Q: Can I submit my assignment after the deadline?

A: We will consider late submissions. However, there will be a penalty of 10 points per day you are late, which will be automatically subtracted from your total score.