Syllabus

 

Introduction to Natural Language Processing

CMPS143, Spring 2018, Section 01: Syllabus

THIS SYLLABUS IS SUBJECT TO CHANGE!!!!!!!

 

NLTK book

speech book

Primary Textbook
(available online)
Additional Resource 

 

 

Course Information

WHERE: Engineering 2: Room 192 (enter from plaza)
WHEN: Tues-Thurs 11:40 to 1:15

 

Instructor Information

Marilyn Walker

Prof. Marilyn Walker

Jack Baskin School of Engineering, Room E2-267
email: mawalker@ucsc.edu
Office Hours:  Wednesday 2 to 3:30. E2 267

 

wen

Wen Cui. email: wcui7@ucsc.edu

 

harrison

Vrindavan Harrison. email: vharriso@ucsc.edu

 

TA/Tutors & Lab Hours:

Tuesdays 3:30 - 5:30pm in Soc Sci I Mac Lab (Room 135)
Wednesdays 1:00- 3:00pm in Soc Sci I Mac Lab (Room 135)
Fridays 9:00 - 11:00am in Soc Sci I Mac Lab (Room 135)

 

Online Class Discussion

This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com.

Find our class page at:

 

 

Course Description

Spring 2018. This class introduces advanced undergraduates to the theory and practice of Natural Language Processing. This offering will focus on NLP programming for processing and generation of narratively structured text, such as classic stories such as Aesop's Fables as well as personal narratives that can be mined on the web. CMPS 143 provides a combination of homeworks and exams targeted at learning the basics of NLP using the NLTK toolkit and other publicly available software.   You must have previous experience with Python, because we can't teach you both Python and NLP in the class.

Text book:

  • Natural Language Processing with Python. Available electronically and from the bookstore. Henceforth referred to as NLLP
  • We will be using NLTK 3.0 and the updated version of the online book that corresponds to it. The version of the book in the bookstore is slightly out of date wrt what is on the web.
  • We will be using Python 3.0 or later. Current version is 3.4.3 being used by instructor and TAs.

Auxiliary texts:

  • Speech and Natural Language Processing. Jurafsky and Martin. Coursera online lectures and parts of book available online.

Grading

  • Attendance: 5%
  • Homeworks and discussion of what we learned from the homeworks in class: 45%
    • Homework INCLUDES project, and final presentation of project during Finals slot
  • Midterm: 25%
  • Final: 25%
  • Homework Delivery: Turn it in on Canvas. Please include any code, files, and written documents in a zip file. Written documents should be plain text or PDF only. Multiple uploads (to overwrite) are enabled. Late HW accepted until noon the next day with a 10% penalty. Homeworks not accepted any later than that because the solutions will be posted at noon and discussed in class.

Schedule: Reading and homework assignments

THIS SCHEDULE IS SUBJECT TO CHANGE!!!!!!! 

PLEASE NOTE:

FINAL SLOT: Monday June 11th. 8:00 - 11:00 AM. Question Answering Competition Results

  • STUDENT PRESENTATIONS. FIVE MINUTES. SHOW YOUR SYSTEM.
  • OUR ANALYSIS. What's hot and what's not. 

Week 1. NLP and Basic Text Processing. Chaps 1 and 2

April 3rd: The NLP Pipeline

  • Homework 0: Already available on Canvas.
    • Entry quiz on logic, probability, regular expressions from Discrete Math, so we know what you know.
    • DUE THURSDAY, APRIL 5th at 11:40 AM.
  • General applications of NLP
  • The Holy Grail: Deep understanding of extended discourses such as stories or dialogues.
  • Installing NLTK, examples of how to use it
  • Reading in language data from files, splitting sentences etc.
  • Counting words modeling their frequency (CH. 1)
  • Collocations and Bigrams
  • Reading: Chapters 1 and  2 (reading in natural language data from files) 
  • Homework 1 assigned. Due Sunday, April 8th. 11:59 PM. 10 pts.

April 5th: Basic Text Processing. Unigrams, Words, POS

  • Homework 0 to be re-opened in Canvas. Due at 11:55 PM. 
  • Review NLTK Lexical resources (Read Chapter 2)
  • Tokenization (READ ch. 3.1.1)
  • POS Word categorization & generalization
  • Stemming
  • Collocations
  • Lexical Meaning: Wordnet 
  • Wordnet (READ ch. 2.5)
  • Synonyms and Synsets.
  • Word Senses
  • Read Chapter 3 for next week.

Week 2:  Words, sentences, frequencies, POS, ngrams. Chaps 3 and 5

Homework 1 DUE Sunday, April 8th at 11:59 PM

April 10th: Lexical Resources. Moving beyond Words and POS

  • PYTHON STUFF: Review chapter 4 if you are learning Python as you go along.
  • Review HW1.
  • Homework 2 assigned. Due Sunday, April 15th at 11:59 PM. 20 pts.
  • What is an ontology?
  • Wordnet API, how it works (READ ch. 2.5)
  • Synonyms and Synsets.
  • Semantic Relatedness
  • Review of Probability. Conditional Probability.
  • POS tagging & applications (READ ch. 5.1 & 5.2)

April 12nd:  Processing Bits of Language above the word 

  • POS tagging & applications (READ ch. 5.1 & 5.2)
  • Regular Expressions (Read ch 3.4, 3.5, 3.6, 3.7)
  • POS tagging patterns with Regexp 
  • N-Gram language models  (READ Chapter 5.4 and 5.5)

Week 3: Natural Language Understanding I 

Homework 2 DUE Sunday, April 15th at 11:59 PM

April 17th: Text Classification I. Using Sentiment Lexicons, Lexical Resources 

  • READING: Chapter 6, sections 6.1 to 6.4 
  • Homework 3 assigned. Due Sunday, April 22th at 11:59 PM. 20 pts.
  • Classifying Texts or Utterances into Categories.
  • Defining an Experiment.
  • Sentiment Classification Problems.
  • Example: Movie Reviews: Thumbs up or Thumbs Down?
  • Example: Restaurant Reviews. For homework.
  • Constructing Feature Representations of Texts. 
  • Features for POS, features for words (unigrams), Bigram Features.
  • We strongly recommend that you start doing the feature extraction right away before Sunday!

April 19th:  

  • How many NLU problems can be cast as classification problems?
  • How to figure out what features are useful.
  • Sentiment Lexicons: LIWC Linguistic Inquiry and Word Count. LIWC Features
  • Examining the most important features. Other methods for classifiers in NLTK.
  • How to do error analysis on your classifier predicted output.
  • Decision Tree learners
  • How Naive Bayes works.

Week 4: More classification

Homework 3 DUE Sunday, April 22th at 11:59 PM

April 24th: Classification, feature analysis, error analysis. 

  • HW4: Extend HW3 with more features, learners, analysis
  • Different types of classifiers
  • Distributional Semantics: Background to Word Embeddings
  • Word Embeddings: how they are built, how you can use in classification
  • HW4: New task: Classifying Blogs using Word Embeddings, Decision Tree learners, SVM in SciKit Learn

April 26th:  POS tagging as a classification problem

  • Where  can we use text classification? How many NLU problems can be cast as classification problems?
  • POS tagging as a classification problem
  • N-Gram language models  (READ Chapter 5.4 and 5.5)
  • Introduction to Parsing

Week 5:

Homework 4 DUE Sunday, April 29th at 11:59 PM

May 1st. The Lexicon, Verbs and their subcategorization.

  • More on Lexical Subcategorization (Re-read Chapter 2)
  • Lexical Meaning: Wordnet and Verbnet
  • Review Wordnet
  • Verbs and their dependents
  • How Parsers Work, Part 1.
  • Homework 5 provides you with sample problems that will allow you to review for the midterm.  Due Sunday, May 6th 11:59 PM. 20 pts.

May 3rd: Discourse & Narrative Meaning

The material covered this day is not on the midterm. But lectures after the midterm will assume it.

  • Story Intention Graph I: Layers of Representation
  • Scheherezade Annotation Tool. Developing the SIG  story representation through annotation.
  • Scheherezade Tutorial, Available Online to do on your own if desired. Aesop's The Fox and Crow.
  • How Scheherezade uses VerbNet and Wordnet.
  • Sample annotation of story blogs.
  • Discourse Relations in Narrative
  • Reading: Chapter 4, Elson 2012. Scherherezade annotation interface. Also short version available as a conference paper.

Week 6: Midterm

Homework 5 is due Sunday May 6th at 11:59pm. It reflects the problems you will encounter on the midterm on Thursday May 10th.

May 8th: 

  • Review for Midterm

May 10th:

  • Midterm (Probability, Conditional Probability, NGram Language models, POS tagging, Stemming, Collocations, Text Classification, Naive Bayes, Wordnet, Verbnet}
  • Multiple Choice. Bring a PINK SCANTRON.

Week 7: Starting on the project homeworks!! Natural Language Understanding II

May 15th: Chunking, Sentence Structure and Parsing I 

  • Homework 6 assigned. Due Sunday May 20th  at 11:59 PM. 20 pts.
  • Factoid QA
  • Baseline QA system. String operations, sentence ranking
  • Types of Questions in baseline QA: Who, What, When, Where
  • Identifying likely phrases and sentences, REGEXP patterns.
  • Sample Code Stubs for baseline system
  • Evaluation metrics: Precision, Recall, F-measure
  • Maximizing Recall at the expense of Precision
  • Setup of QA task and demo of scoring
  • The evolution of parsing algorithms.
  • How Parsers work. Probabilistic Parsing.

May 17th:  MORE Natural Language Understanding for QA

  • Chunking (Shallow Parsing vs. Parsing). Read Ch. 7.  
  • Sentence Structure. READ ch. 8.1-8.3, 8.5
  • QA pipeline
  • Baseline QA system using string operations and sentence ranking
  • Next Steps: Using Syntax
  • Stanford Dependency Parse structure VS. Constituent structure
  • Dependency Structures and Relations
  • Constituent Structures and CFG rules

Week 8:  Question Answering  II

HOMEWORK 6 is DUE SUNDAY NIGHT MAY 20th at 11:59 PM

May 22nd: Question Answering II: Using Syntax, Working with NLU representations for Question Answering II

  • Homework 7 assigned. Due Sunday, May 27th at 11:59 PM.
  • IMPROVING PRECISION.
  • Grammars and Parsing
  • Constituency and Dependency Tree Readers
  • Chunking and Parsing: How to search trees
  • Pattern Matching on Dependency Relations
  • Ranking possible responses

May 24th:

  • Working with NLU representations.
  • Syntactic Structure and Coordination
  • Prepositional Phrase Attachments
  • Dependency vs. Constituent Structures II
  • Answering Questions from NLU/parsing representations

Week 9: Question Answering III: Lexicons & Lexical Semantics 

Homework 7 DUE. Sunday MAY 27th at 11:59 PM.

May 29th:  Using VerbNet and WordNet API in QA

  • HW8: final HW assigned. Due June 10th 11:59pm. The final QA competition. 
  • Paraphrases and Lexical Choice
  • Verbs and their dependents, VerbNet semantic role types
  • Verbnet and Wordnet API, how it works
  • Word Sense Disambiguation .DICT files provided with HW8.
  • Constituent and Dependency Trees, finding subjects etc
  • Increasing Precision of Answers

May 31st: Review of all techniques for QA, types of questions, methods

  • Review for Final
  • Using WordNet and Verbnet. New kinds of questions.
  • Increasing Precision of Answers

Week 10: Question Answering Competition & Final Exam in Class slot

June 5nd: No Lecture. Special Section. 2 to 4 in Class.

  • Come to class with your questions. Work on your QA system, prepare for Final. 

June 7th: FINAL EXAM in CLASS SLOT.

 

FINALS WEEK

Homework 8 DUE. Friday, June 8th at  11:59 PM

FINAL SLOT: Monday June 11th. 8:00 to 11:00 AM Question Answering Competition Results

  • STUDENT PRESENTATIONS. TEN MINUTES. SHOW YOUR SYSTEM.
  • OUR ANALYSIS. What's hot and what's not. 

 

  •