CENG 464

Text Mining

In this course, we will cover important topics in text mining including: basic natural language processing techniques, document representation, text categorization and clustering, document summarization, sentiment analysis, social network and social media analysis, probabilistic topic models and text visualization.

Course Objectives

This course provides an opportunity to learn key components of text mining and analytics aided by the real-world datasets and the text mining toolkit written in Python. Hands-on experience in core text mining techniques including text preprocessing, sentiment analysis, and topic modeling help students be trained to be a data scientists.

Recommended or Required Reading

J. Eisenstein,” Introduction to Natural Language Processing”, MIT Press, 2019. ,S. Ghosh & D.Gunning, “Natural Language Processing Fundamentals”, Packt, 2019.

Learning Outcomes

1. Using Python and NLTK (Natural Language Toolkit) to build out your own text classifiers and solve common NLP problems
2. Performing data analysis and machine learning tasks using Python
3. Understanding the basics of computational linguistics
4. Building models for general natural language processing tasks
5. Evaluating the performance of a model with the right metrics
6. Visualizing, quantifying, and performing exploratory analysis from any text data

Topics
Introduction
Steps in NLP
Steps in NLP
Document Representation
Document Representation
Text Classification
Text Categorization
Data Collection
Topic Modelling
Text Summarization
Text Generation
Social Media and Network Analysis
Sentiment Analysis
Text Visualization

Grading

Midterm 25%

Research Presentation 35%

Final 40%