The goal of this homework assignment is to implement a "Markov Chain Babbler" program that is able to take as input some English language text and produce new (and likely amusing) variants of the input sentences.
For this assignment, place your Python code in the
homework05 folder of your assignments GitLab
repository and push your work by 11:59 PM Monday, October 29.
To create a
homework05 branch in your local repository, follow the
$ cd path/to/cse-40171-fa18-assignments # Go to assignments repository $ git remote add upstream https://gitlab.com/wscheirer/cse-40171-fa18-assignments # Switch back over to the main class repository $ git fetch upstream # Toggle the upstream branch $ git pull upstream master # Pull the files for homework05 $ git checkout -b homework05 # Create homework05 branch and check it out $ cd homework05 # Go into homework05 folder
As we are learning in class, Markov Chain Models can process English language text as a form of structured sequence data. To begin, let's look at some text – specifically, some lines from William Shakespeare's Sonnets:
When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a tatter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.
We can see that compared to modern English prose, these lines are highly structured, following a specific meter (iambic pentameter), and are roughly the same length. This makes such text very convenient to work with when it comes to Markov Chain Models – we can expect that randomly sampled two-word consecutive sequences will mix and match quite well, given the probabilistic nature of such language. You can download the full set of lines to use as input for the assignment here.
Your task in this activity is to write a Python program that is able to read the lines of the sonnets from file, compute the conditional frequencies of the words in the input text, and then generate new, random, lines of text that are (somewhat) sensical using a Markov Chain Model. The lines you generate should not be purely random. They should reflect some of the structure of the originals. Your program should output ten new lines of text each time it is run. For example, some valid Markov Chain Model outputs for the
Despite of sums, yet canst thou not to give?
Upon thy beauty by succession thine!
Beauty o'er-snowed and confounds him there;
Nature's bequest gives nothing, but doth dwell,
Thy unused beauty lies,
Where all thy golden time.
Were an all-eating shame, and lusty leaves quite gone,
Shall sum of wrinkles this thy sweet self thy self thy self alone,
Whose fresh repair if now thou art old,
And that face should form another;
Here are some guidelines that will be helpful:
Nature's bequest gives nothing, but doth dwell,is nearly identical to the line
Nature's bequest gives nothing, but doth lend,in
sonnets.txt, with the exception of the last word. This is because most of these particular words only have one path to a connected word (e.g., "Nature" appears once, and "bequest" appears once right after it, thus they must always be connected if "Nature" is selected as the first word). "doth" has a path to three different words, thus giving us more options for the ending.
If you have any questions, comments, or concerns regarding the course, please
provide your feedback at the end of your
To submit your assignment, please commit your work to the
homework05 branch in your assignment's GitLab repository:
$ cd path/to/cse-40171-fa18-assignments # Go to assignments repository $ git checkout master # Make sure we are in master branch $ git pull --rebase # Make sure we are up-to-date with GitLab $ git checkout -b homework05 # Create homework05 branch and check it out $ cd homework05 # Go to homework05 directory ... $ $EDITOR README.md # Edit appropriate README.md $ git add README.md # Mark changes for commit $ git commit -m "homework05: complete" # Record changes ... $ git push -u origin homework05 # Push branch to GitLab
Procedure for submitting your work: create a merge request by the process that is described here, but make sure to change the target branch from wscheirer/cse-40171-fa18-assignments to your personal fork's master branch so that your code is not visible to other students. Additionally, assign this merge request to your TA and add wscheirer, agraese, and AndroidKitKat as approvers (so all class staff can track your submission). Your assigned TA is agraese if you have a last name starting with A through Ki, or AndroidKitKat if you have a last name starting with Kl through W.