Hello everyone, I hope you are feeling hyped about starting the module! To kick things off lets immerse ourselves (gently) into the subject matter by firstly establishing a baseline of understanding. We firstly need to understand and define a few key terms and technologies, so that's exactly where we will start. As with all the topics in this module, I am available via Teams Chat for any further clarification!
LEARNING OUTCOMES
Completing this topic will enable you to:
Describe the key concepts of Big Data
Explain the role of Big Data in Research
Explain the difference between BI and Data Science
Now that we have established an understanding of the key terms and concepts its time to go a little deeper and explore Big data characteristics a little further.
When people talk about Big Data tools and technologies the first name that comes to most people's minds is Hadoop! Why is that you ask? Well you're about to find out!
LEARNING OUTCOMES
Completing this topic will enable you
to:
Explain the Hadoop Features and
assumptions
Explain the core components of
Hadoop
Differentiate the categories of
NoSQL
Explain the difference between
NoSQL v/s Relational database
I imagine most of us are accessing this content on either a Windows, Mac, iPad or Android system. Each of these platforms has its own way of storing and organising data (ie FAT and NFTS on Windows or Ext4 and Btrfs on Linux), and Hadoop is no different. Let's investigate the particular characteristics of Hadoop which facilitate the storage of (very) large volumes of data
A file system requires a mechanism by which to access, manage and manipulate the data. HBase happens to be that mechanism and is a column-oriented database management system that runs on top of HDFS. Our next topic looks to explore HBase in more detail, so as to build up our understanding of Hadoop
As with any technology there are always alternatives to consider. MapReduce, for all its capabilities is not perfect and has some key limitations of which we need to be aware. An alternative solution is available in the form of Spark, so let's investigate exactly what it has to offer!
LEARNING OUTCOMES
Completing this topic
will enable you to:
Explain the concepts of Spark
Differentiate between Hadoop MapReduce and Spark
Explain the features of Spark
Explain the concepts of Resilient Distributed Datasets
Wow! We've almost reached the end of our journey through big data analytics and technologies. Only one topic remains, the underlying engine which powers the SQL queries in Hadoop.
LEARNING OUTCOMES
Completing this topic will enable you to:
Explain the core concepts of
Hive
Explain the importance of Hive
Implement a Times Series
Analysis and Forecast on a sample dataset