The goal of this assignment is to write some code that will be able to train a Multilayer Perceptron (MLP) regression model and make predictions based on that model. To do this, we will use the PyBrain Python package for constructing custom neural networks. PyBrain is compatible with Linux, MacOS and Windows. Feel free to choose any environment that you prefer. In many Python environments, installation of PyBrain is as simple as pip install pybrain. However, if that does not work, or if you want an alternative installation method, there are other ways available.

For this assignment, place your Python code in the homework08 folder of your assignments GitLab repository and push your work by 11:59 PM Friday, November 30.

Activity 0: Branching

As discussed in class, each homework assignment must be completed in its own git branch; this will allow you to separate the work of each assignment and for you to use the merge request workflow.

To create a homework08 branch in your local repository, follow the instructions below:

$ cd path/to/cse-40171-fa18-assignments                                                     # Go to assignments repository

$ git remote add upstream https://gitlab.com/wscheirer/cse-40171-fa18-assignments           # Switch back over to the main class repository

$ git fetch upstream                                                                        # Toggle the upstream branch

$ git pull upstream master                                                                  # Pull the files for homework08

$ git checkout -b homework08                                                                # Create homework08 branch and check it out

$ cd homework08                                                                             # Go into homework08 folder

Once these commands have been successfully performed, you are now ready to add, commit, and push any work required for this assignment.

Activity 1: Train a Regression Model (50 Points)

We've spent a bit of time in class discussing machine learning tasks like classification and clustering, but we haven't said much about regression, where we don't want to assign a class label, but instead want to predict a continuous value for a feature vector. In other words, our output y is a real-valued prediction. This setup is useful for many problems in social and behavioral science. One example problem is predicting the median value of owner-occupied homes given a set of attributes about the home and the surrounding neighborhood. The Boston Housing Dataset is a classic dataset used by the machine learning community to evaluate regressors, which contains data for the aforementioned problem. Given a set of 13 continuous feature dimensions, the task is to predict the housing values in suburbs of Boston as the median value in the thousands of dollars. Download the individual training, validation, and testing files that have been prepared for this assignment. The feature dimensions are as follows (the first thirteen are the x values):

  1. per capita crime rate by town
  2. proportion of residential land zoned for lots over 25,000 sq.ft.
  3. proportion of non-retail business acres per town
  4. Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  5. nitric oxides concentration (parts per 10 million)
  6. average number of rooms per dwelling
  7. proportion of owner-occupied units built prior to 1940
  8. weighted distances to five Boston employment centres
  9. index of accessibility to radial highways
  10. full-value property-tax rate per $10,000
  11. pupil-teacher ratio by town
  12. 1000(Bk - 0.63)^2 where Bk is the proportion of African Americans by town
  13. % lower status of the population
  14. Median value of owner-occupied homes in $1000's (the y value)

Use PyBrain to write a training program called trainNet.py that will learn an MLP regression model from the housing-training.csv and housing-validation.csv files. Here are some guidelines:

The PyBrain documentation may be useful. Also feel free to use other reference code available on the web (cite any sources you used in your README.md file).

After training, note the root-mean-square-error value achieved at epoch 1000 in your README.md file.

Activity 2: Make Predictions Using the Regression Model (50 Points)

Use PyBrain to write a prediction program called predictNet.py that will use the trained model from Activity 1 to make predictions for the feature vectors in the housing-testing.csv file. Here are the guidelines for this activity:

How did your trained model do making predictions on the test data? Add your answer to your README.md file.

Feedback

If you have any questions, comments, or concerns regarding the course, please provide your feedback at the end of your README.md.

Submission

To submit your assignment, please commit your work to the homework08 folder of your homework08 branch in your assignment's GitLab repository:

$ cd path/to/cse-40171-fa18-assignments   # Go to assignments repository
$ git checkout master                     # Make sure we are in master branch
$ git pull --rebase                       # Make sure we are up-to-date with GitLab
$ git checkout -b homework08              # Create homework08 branch and check it out
$ cd homework08                           # Go to homework08 directory
...
$ $EDITOR README.md                       # Edit appropriate README.md
$ git add README.md                       # Mark changes for commit
$ git commit -m "homework08: complete"    # Record changes
...
$ git push -u origin homework08           # Push branch to GitLab

Procedure for submitting your work: create a merge request by the process that is described here, but make sure to change the target branch from wscheirer/cse-40171-fa18-assignments to your personal fork's master branch so that your code is not visible to other students. Additionally, assign this merge request to your TA and add wscheirer, agraese, and AndroidKitKat as approvers (so all class staff can track your submission). Your assigned TA is agraese if you have a last name starting with A through Ki, or AndroidKitKat if you have a last name starting with Kl through W.