Course: ODL-DSBA Data Management

Topic outline

General
- Announcements Forum
Module Overview
Module Lecturer
Hello everyone, my name is Dr. Murugananthan (Dr. Muru for short!) and I am the lecturer responsible for Data Management. Please feel free to contact me via Ms Teams or email, I am also available for 1-2-1 consultation (Please refer to the iConsult system). Should you have any queries or questions please reach out, best of luck with the module!
- Module Synopsis
  This module will provide the learner with an overview of the importance of data in the growing field of data science and analytics. The learner will study both the established methods and technologies used and also investigate new and emerging ones. Emphasis will be placed on the data mining models in context of organizational data, various data types & exploratory, data preprocessing measures & techniques, data warehousing & data governance. This module corresponds to CT051-3-M-DM, therefore please refer to the non-ODL MD if any changes to the module are needed.
  Course Learning Outcomes
  CLO1: Evaluate the various data types, data storage systems and associated techniques for indexing and retrieving data. (C6, PLO7)
  CLO2: Design feature engineering techniques to transform transactional data into meaningful inputs in order to create a predictive model. (A5, PLO6)
  CLO3: Propose a suitable approach to designing a data warehouse to store and process large datasets. (A3, PLO5)
Module Introduction
- Overview Page
  Welcome to our first class. We will discuss the following matters.
  
  •Module overview
  
  •Assessment requirements
  
  •Teaching strategies
- Topic 1 - Introduction and Overview File
- Discussion Forum
  Please raise any queries about how the module will be covered as well as the nature of assessments
Organisation Data Preparation
- Overview Page
  I hope you are all excited to get started with our first topic The learning outcomes for this topic are as follows:
  •List and define various sources of data
  
  •Explain the fundamental differences between databases, data warehouses, and datasets
  
  •Explain some of the ethical dilemmas associated with data mining and outline possible solutions
- Topic 2 – Organizational Data Preparation File
- Discussion Forum
  .Explain the pros and cons of using regression in supervised data mining
Data Types
- Overview Page
  In this topic of data types, the learning outcomes for this topic are as follows:
  • Define what a dataset is
  
  • Explain the different types of variables
  
  • Describe six basic ways to identify variables
- Topic 3 – Data Types File
- Discussion Forum
  Discuss the key differences between moderating variables and mediating variables with examples.
Data Preprocessing - Part 1 and 2
- Overview Page
  I hope you are enjoying the material. For data processing, there are two parts. The first part will
  address the need for data preparation; discuss the multidimensional view of data quality as well as explain the major tasks in data preprocessing, especially data cleaning. In the second part, we will delve into data integration and data transformation.
- Topic 4 – Data Preprocessing – PART 1 File
- Topic 4 – Data Preprocessing – PART 2 File
- Discussion Forum
  Explain with an example of the impact on mean normalization. You need to discuss the consequence to data analysis if we choose not to do mean normalization.
Exploratory Data Analysis
- Overview Page
  Now that we had discussed the processing of data, we are now ready for data analysis. There are two components to our approach: Descriptive statistics and graphical illustrations. For descriptive statistics, we will explore data analysis on categorical data and continuous data. For graphical illustrations, we will also discuss ways to represent categorical data and continuous data graphically.
- Topic 5 – Exploratory Data Analysis File
- Discussion Forum
  Provide an example of categorical data where using a histogram may not be the best approach to explore the data.
Data Warehouse
- Overview Page
  I hope you are enjoying the material as much as i do. FOr this topic on data warehousing, we will cover the following:
  
  1. Nature of data warehouse and OLAP concepts
  2. Properties of a data warehouse architecture and schemes
  3. Concept of OLAP
- Topic 6 – Data Warehouse File
- Discussion Forum
  Describe a practical scenario where an enterprise warehouse approach is more suitable than a data mart approach for a warehouse model.
Hadoop
- Overview Page
  With data warehousing under our belt, we explore HADOOP as a means to process big data with reasonable cost and time. In this topic, we will discuss the following sub-topics:
  Hadoop Framework
  Hadoop’s Architecture
  Hadoop in the Wild
  Data warehouse to Hadoop
- Topic 7 – Hadoop File
- Discussion Forum
  Discuss the pros and cons of using HADOOP in practical scenarios.
Hive
- Overview Page
  Hi, we continue to explore the Hadoop environment in this topic. The Hadoop ecosystem contains different sub-projects (tools) such as Sqoop, Pig, and Hive that are used to help Hadoop modules. We will study in-depth the use of HIVE to develop SQL type scripts to do operations.
- Topic 8 – HIVE File
- Discussion Forum
  Discuss how we may mitigate the cons of using HIV to analyse big data.
Data Security and Governance
- Overview Page
  Congratulations on coming through thus far. In our last topic, we are interested n the following questions:
  How do we define Data Governance and its relationship to IT Governance?
  What are some of the key pillars of a Data Governance Program?
  What challenges does a Data Governance Program face early on?
  How can Data Governance and Internal Audit collaborate or leverage each other?
- Topic 9 – Data Security and Governance File
- Discussion Forum
  Explain with some practical scenarios the consequences if we do not execute data governance with due diligence.
This topic
Sample Assessments
- ASSIGNMENT-QUESTION-AND-MARKING-SCHEME Folder