The goal of this homework assignment is to implement a "Markov Chain Babbler" program that is able to take as input some English language text and produce new (and likely amusing) variants of the input sentences.

For this assignment, place your Python code in the homework05 folder of your assignments GitLab repository and push your work by 11:59 PM Monday, October 29.

Activity 0: Branching

As discussed in class, each homework assignment must be completed in its own git branch; this will allow you to separate the work of each assignment and for you to use the merge request workflow.

To create a homework05 branch in your local repository, follow the instructions below:

$ cd path/to/cse-40171-fa18-assignments                                                     # Go to assignments repository

$ git remote add upstream https://gitlab.com/wscheirer/cse-40171-fa18-assignments           # Switch back over to the main class repository

$ git fetch upstream                                                                        # Toggle the upstream branch

$ git pull upstream master                                                                  # Pull the files for homework05

$ git checkout -b homework05                                                                # Create homework05 branch and check it out

$ cd homework05                                                                             # Go into homework05 folder

Once these commands have been successfully performed, you are now ready to add, commit, and push any work required for this assignment.

Activity 1: A Markov Chain Babbler That Wants to be Shakespeare (100 Points)



Digital Shakespeare, CC BY-NC-SA 2.0, [Ed]

As we are learning in class, Markov Chain Models can process English language text as a form of structured sequence data. To begin, let's look at some text – specifically, some lines from William Shakespeare's Sonnets:

When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a tatter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.

We can see that compared to modern English prose, these lines are highly structured, following a specific meter (iambic pentameter), and are roughly the same length. This makes such text very convenient to work with when it comes to Markov Chain Models – we can expect that randomly sampled two-word consecutive sequences will mix and match quite well, given the probabilistic nature of such language. You can download the full set of lines to use as input for the assignment here.

Your task in this activity is to write a Python program that is able to read the lines of the sonnets from file, compute the conditional frequencies of the words in the input text, and then generate new, random, lines of text that are (somewhat) sensical using a Markov Chain Model. The lines you generate should not be purely random. They should reflect some of the structure of the originals. Your program should output ten new lines of text each time it is run. For example, some valid Markov Chain Model outputs for the sonnets.txt are:

Despite of sums, yet canst thou not to give?
Upon thy beauty by succession thine!
Beauty o'er-snowed and confounds him there;
Nature's bequest gives nothing, but doth dwell,
Thy unused beauty lies,
Where all thy golden time.
Were an all-eating shame, and lusty leaves quite gone,
Shall sum of wrinkles this thy sweet self thy self thy self alone,
Whose fresh repair if now thou art old,
And that face should form another;

Here are some guidelines that will be helpful:

Feedback

If you have any questions, comments, or concerns regarding the course, please provide your feedback at the end of your README.md.

Submission

To submit your assignment, please commit your work to the homework05 folder of your homework05 branch in your assignment's GitLab repository:

$ cd path/to/cse-40171-fa18-assignments   # Go to assignments repository
$ git checkout master                     # Make sure we are in master branch
$ git pull --rebase                       # Make sure we are up-to-date with GitLab
$ git checkout -b homework05              # Create homework05 branch and check it out
$ cd homework05                           # Go to homework05 directory
...
$ $EDITOR README.md                       # Edit appropriate README.md
$ git add README.md                       # Mark changes for commit
$ git commit -m "homework05: complete"    # Record changes
...
$ git push -u origin homework05           # Push branch to GitLab

Procedure for submitting your work: create a merge request by the process that is described here, but make sure to change the target branch from wscheirer/cse-40171-fa18-assignments to your personal fork's master branch so that your code is not visible to other students. Additionally, assign this merge request to your TA and add wscheirer, agraese, and AndroidKitKat as approvers (so all class staff can track your submission). Your assigned TA is agraese if you have a last name starting with A through Ki, or AndroidKitKat if you have a last name starting with Kl through W.