Course: ODL - Natural Language Processing

Section outline

Select section NLP - Module Overview

Collapse Expand
NLP - Module Overview

Collapse all Expand all
Module Lecturer
Mafas Raheem
Data Scientist | Business Analyst | Senior Lecturer
I am an academic/trainer/researcher specializing in the field of Data Science & Business Analytics with nearly 17 years of academic & industry experience. I hold an MSc in Data Science & Business Analytics and a Master of Business Administration degree and currently reading my PhD in the area of machine learning (Natural Language Processing) at the Asia Pacific University of Innovation and Technology, Malaysia. I have published a significant number of indexed journal articles in the area of Machine Learning and Data Science matching the current business needs.
I am actively involved in consulting data analytics/machine learning projects for the business/retail domains. I have been involved in numerous data mining projects in Malaysia, and overseas. My knowledge in statistics along with my data mining/machine learning expertise always adds value in solving the contemporary business problems faced by SMEs in the area of market expansion. Also, I conduct training for data analysts and data science professionals in the area of machine learning, data storytelling and business analysis.
LinkedIn
Google Scholar
Email: raheem@apu.edu.my
Email Subject: CT052-3-M-ODL-NLP– your intake – your name – subject/request title
Use only your APU official Email for correspondence.

Consultation:
Refer to “Staff Consultation Hour” on APU Apspace to book appointments.

Module Synopsis
The module discusses various models and techniques in current NLP practices. The module covers a broad range of topics in natural language processing, including word and sentence tokenization, text classification and sentiment analysis, spelling correction, information extraction, parsing. Further, it also introduces the underlying theory from probability, statistics, and machine learning that are crucial for the field, and cover fundamental algorithms like n-gram language modelling, naive bayes and maxent classifiers. The specified theories and concepts will be delivered using relevant natural language processing libraries such as NLTK, textblob, VADER, langdetect and translate along with Scikit-Learn to handle machine learning algorithms and related operations.

Course Learning Outcomes (CLO)
At the end of the course the students will be able to:

CLO1 Demonstrate candidate natural language processing techniques for a problem in a specific domain (A3, PLO6)
CLO2 Formulate text processing techniques for a real-world application (C6, PLO2)
CLO3 Defend a proposed natural language processing system for a chosen problem (A4, PLO10)

Course Outline

Assessments
In-course Assessment - 100%
Report - 60%
Demo Presentation - 40%

References
Recommended References
These text books are available in APU eLibrary
Campesato, O. (2020). Python 3 for machine learning. Mercury Learning & Information. ISBN-13: 9781683924937
Liu, Z., Lin, Y., & Sun, M. (2020) Representation Learning for Natural Language Processing. Springer Singapore. ISBN-13: 9789811555732
Patrick, et. al., (2020) Natural Language Processing. SAGE Publications Ltd. ISBN-13: 9781529749120
- Select activity Module Introduction
  
  Module Introduction File
  
  Students must
  
  Mark as done
Select section Introduction to NLP

Collapse Expand
Introduction to NLP
Natural language processing (NLP) is a branch of Artificial Intelligence or AI of computer science concerned with giving computers the ability to understand text and spoken words in much the same way human beings can. NLP combines computational linguistics, rule-based modeling of human language with statistical, machine learning, and deep learning models.
Learning Outcomes:
Explain the specific areas of NLP
Explain the difficulties/challenges of NLP
- Select activity Introduction to NLP
  
  Introduction to NLP File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Python
  
  Tutorial - Python Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Lexical Analysis

Collapse Expand
Lexical Analysis
Regular Expression is a sequence of characters that specifies a match pattern in text. Word tokenization is the process of splitting a large sample of text into words. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma.
Learning Outcomes:
Explain the functionalities of Regular Expression.
Explain the process of tokenization.
Explain stemming and lemmatization.
- Select activity Introduction to Basic Text Processing
  
  Introduction to Basic Text Processing File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Regular Expression
  
  Tutorial - Regular Expression Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Part-of-speech (POS)

Collapse Expand
Part-of-speech (POS)
Part-of-speech (POS) tagging is a process in natural language processing (NLP) where each word in a text is labeled with its corresponding part of speech. This can include nouns, verbs, adjectives, and other grammatical categories.
Learning Outcomes:
Explain POS tagging in NLP.
- Select activity POS Tagging
  
  POS Tagging File
  
  Students must
  
  Mark as done
- Select activity Tutorial - POS Tagging
  
  Tutorial - POS Tagging Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Parsing

Collapse Expand
Parsing
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar.
Learning Outcomes:
Explain Parsing in NLP
- Select activity Parsing
  
  Parsing File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Parsing
  
  Tutorial - Parsing Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Edit Distance

Collapse Expand
Edit Distance
In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings (e.g., words) are to one another, that is measured by counting the minimum number of operations required to transform one string into the other.
Learning Outcome:
Explain Edit Distance and its process.
Demonstrate Edit Distance
- Select activity Edit Distance
  
  Edit Distance File
  
  Students must
  
  Mark as done
- Select activity Spelling Correction
  
  Spelling Correction File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Minimum Edit Distance
  
  Tutorial - Minimum Edit Distance Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Language modeling (LM)

Collapse Expand
Language modeling (LM)
Language modeling (LM) is the use of various statistical and probabilistic techniques to determine the probability of a given sequence of words occurring in a sentence. Language models analyze bodies of text data to provide a basis for their word predictions.
Learning Outcomes:
Explain the concept of Language modeling (LM).
Demonstrate Language modeling (LM) using NLTK.
- Select activity Language Modelling
  
  Language Modelling File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Language Modelling
  
  Tutorial - Language Modelling Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Text Classification

Collapse Expand
Text Classification
Text classification is a machine learning technique that assigns a set of predefined categories to open-ended text. Text classifiers can be used to organize, structure, and categorize pretty much any kind of text.
Learning Outcomes:
Explain the task of Text Classification.
Explain text classification using Naive Bayes classifier.
Explain the concept of smoothing in Naive Bayes classifier.
- Select activity Text Classification
  
  Text Classification File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Supervised Text Classification
  
  Tutorial - Supervised Text Classification Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Sentiment Analysis

Collapse Expand
Sentiment Analysis
Sentiment analysis, also referred to as opinion mining, is an approach to natural language processing (NLP) that identifies the emotional tone behind a body of text. This is a popular way for organizations to determine and categorize opinions about a product, service or idea.
Learning Outcomes:
Explain the concept of sentiment analysis.
Perform sentiment analysis using suitable machine learning algorithms
- Select activity Sentiment Analysis
  
  Sentiment Analysis File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Sentiment Analysis
  
  Tutorial - Sentiment Analysis Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Entropy Classifiers

Collapse Expand
Entropy Classifiers
(maxent) classifier has been a popular text classifier, by parameterizing the model to achieve maximum categorical entropy, with the constraint that the resulting probability on the training data with the model being equal to the real distribution.
Learning Outcomes:
Explain the concepts of (maxent) classifier.
Explain different types of entropy classifiers.
- Select activity Entropy Classifiers
  
  Entropy Classifiers File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Entropy Classifiers
  
  Tutorial - Entropy Classifiers Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Information Extraction & Named Entity Recognition

Collapse Expand
Information Extraction & Named Entity Recognition
- Select activity IE and NER
  
  IE and NER File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Information Extraction & Named Entity Recognition
  
  Tutorial - Information Extraction & Named Entity Recognition Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Information Retrieval

Collapse Expand
Information Retrieval
Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.
Learning Outcome:
Explain the concept of Information retrieval
- Select activity Information Retrieval (IR)
  
  Information Retrieval (IR) File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Information Retrieval
  
  Tutorial - Information Retrieval Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Relation Extraction

Collapse Expand
Relation Extraction
Relation Extraction is the task of predicting attributes and relations for entities in a sentence.
Learning Outcome:
Explain the concept of Relation Extraction
- Select activity Relation Extraction
  
  Relation Extraction File
  
  Students must
  
  Mark as done
- Select activity Tutorial - Relation Extraction
  
  Tutorial - Relation Extraction Folder
  
  Students must
  
  Mark as done
- Select activity Discussion
  
  Discussion Forum
  
  Students must
  
  Mark as done
Select section Text Books - PDF Version

Collapse Expand
Text Books - PDF Version
- Select activity Text Books - PDF Version
  
  Text Books - PDF Version Folder
  
  Students must
  
  Mark as done
Select section Sample Assessment

Collapse Expand
Sample Assessment
- Select activity Sample Assessment
  
  Sample Assessment Folder
  
  Students must
  
  Mark as done