Make-Up Homework

In this homework, we will look at character-level language models. These models take a sequence of characters as input and predict the next character.

This assignment, as with all of the homework assignments, should be completed individually without sharing solutions, models, or ideas with other students. See details at the bottom of this page.

Starter Code and Data

Starter code for this assignment is available here. The starter code zip for this assignment includes the dataset you will need.

For this assignment, we will use the transcripts of several speeches by Barack Obama. (We chose Obama because there are more publicly available speech transcripts than other presidents.) We’ll be using a simplified character set consisting of only 26 letters (we do not distinguish between capital and lower-case letters), space and period.

Inside models.py you will find the LanguageModel class. This is an abstract class representing the language modeling functionality you will be developing. Our language models work by predicting 28 probabilities – one for each character in our reduced character set. Effectively, the language model is a classifier over our character set. The LanguageModel class has two methods: predict_all takes a string s of length n and predicts the next character for each substring s[:i] for i from 0 to n. Note that this includes predicting next characters for the empty string '' and the whole string s, so this method should return n+1 values. The predict_next method takes a string and predicts just the next character for the entire string. This method is already implemented and it just calls predict_all.

As a first step, the starter code includes a fully-implemented Bigram model as well as an adjacent language model which favors characters which are close in the alphabet to the current character. These models provide baselines you can compare your network to and test your work.

You will also find utils.py which includes some useful functionality. The SpeechDataset class loads the speech data and the one_hot function converts a string into a one-hot encoding of characters with shape (28, len(s)). You can create a one-hot encoded dataset by calling SpeechDataset('data/train.txt', transform=one_hot).

The parts of this homework are largely indpendent, so feel free to skip around and do things in whatever order you like.

Log-likelihood of Text (10 pts)

In this section, you will implement log_likelihood in language.py. This function should take a model and a string as input and return the probability of generating that string under that model. You can use Bigram or AdjacentLanguageModel to test your function:

python -m homework.language -m Bigram

Hints:

  • Remember that the language model can take an empty string as input.
  • Remember that the model’s predict_all method will return the log-probabilities of the next character of each prefix of the string.
  • The log-likelihood is the sum of the log-likelihoods of each individual character, not the average or the product.

Generating Text (10 pts)

Now, we will implement sample_random in language.py. This function takes a language model as input and generates a string by repeatedly sampling according to the distribution generated by the model. (This is what we called “sequential sampling” in class.) The model should stop generating when max_length characters are produced or if the model produces a period.

Hint: take a look at torch.distributions for some useful functionality related to sampling from distributions. The Categorical class may be particularly useful.

Note that the text you generate may not look very good. Here is one output sampled from the temporal convolutional network in the master solution: “some of a backnown my but or the understand thats why weve hardships not around work since there one they will be begin with consider daughters some as more a new but jig go atkeeral westedly.”

Beam Search (20 pts)

Here we will implement a beam search in the beam_search function in language.py. Once again, the search terminates if the sequence reaches max_length characters or if a period is generated. The beam search should keep the top beam_size results at each step, and return the top n_results sequences at the end of the search. Sequences should be scored on one of two criteria: their overall log-likelihood (as in log_likelihood) if average_log_likelihood=False or the average per-character log likelihood if average_log_likelihood=True. In general, the per-character average will result in longer sequences while the overall log-likelihood will terminate after a relatively small number of words.

Hint: Inside language.py you will find TopNHeap. This class will be useful to keep track of the best sequences during your search. Here are a coupl of samples from the master solution with average_log_likelihood=False:

thats.
today.
in.
now.

And here are a few with average_log_likelihood=True:

and we will continue to make sure that we will continue to the united states of american.
and we will continue to make sure that we will continue to the united states of the united states.
and we will continue to make sure that we will continue to the united states of america.
and thats why were going to make sure that will continue to the united states of america.

Temporal Convolutional Model (20 pts)

In this section you will define a temporal convolutional network (TCN) to use for language modeling. For this, you will need to first define the CausalConv1dBlock, which combines a causal convolution with a nonlinearity. The main TCN function should use a stack of causal blocks with dilation to build a complete model. At the end of the model, you’ll want to add a convolutional layer with a window size of 1 to produce the output. Then TCN.predict_all should use TCN.forward to predict the log-probabilies for an input sequence.

Hints:

  • Make sure TCN.forward works with batches of data.
  • Make sure TCN.predict_all returns log-probabilities, not logits.
  • Store the distribution over the first character in the sentence as a parameter of the model using torch.nn.Parameter.
  • Try to keep the model relatively small. The master solution takes about 15 minutes to train on a pretty good GPU. Yours make take a little longer, but it shouldn’t need hours of training.
  • Try adding a residual connection to the block.

Relevant documentation:

TCN Training (40 pts)

Train your TCN using train.py. You may want to reuse some training code from previous assignments.

Hint: You may find SGD works better than Adam in this case, but you may also need a large learning rate (around 0.1).

Grading

The validation grader can be run with

python -m grader homework -v

Note that this grader is using a different set of test data than we use for our grader, so there may be some difference between the two. The distributions will be the same so your grade should be similar between the two graders, unless the model is massively overfit to the validation set.

Submission

Once you are ready to submit, create a bundle by running

python bundle.py homework <eid>

then submit the resulting zip file on Canvas. Note that the grader has a maximum file size limit of 20MB. You shouldn’t run into this limit unless your models are much larger than they need to be. You can check that your homework was bundled properly by grading it again with

python -m grader <eid>.zip

Online Grader

The online grading system uses a slightly modified version of Python, so please make sure your code follows these rules:

  • Do not use exit or sys.exit as this will likely crash the grader.
  • Do not try to access files except for the ones provided with the homework zip file. Writing files is disabled.
  • Network access is disabled in the grader. Make sure you are reading your dataset from the data folder rather than from a network connection somewhere.
  • Do not fork or spawn new processes.
  • print and sys.stdout.write are ignored by the grader.

Honor Code

You should do this homework individually. You are allowed to discuss high-level ideas and general structure with each other, but not specific details about code, architecture, or hyperparameters. You may consult online sources, but don’t copy code directly from any posts you find. Cite any ideas you take from online sources in your code (include the full URL where you found the idea). You may refer to your own solutions or the master solutions to prior homework assignments from this class as well as any iPython notebooks from this class. Do not put your solution in a public place (e.g., a public GitHub repo).

Acknowledgements

This assignment is very lightly modified from one created by Philipp Krähenbühl.