code

title

similar

needs_screening

specifics

prereq

kind

CSPC40

Big Data Analytics

ITPC40

true

branch

semester

credits

CS

8

3

0

2

4

CSPC21

PC1

Objectives

To understand the concept and challenge of big data and why traditional technology is inadequate to analyze the big data.
To collect, manage, store, query, and analyze various types of big data.
Gain hands-on experience on large-scale analytics tools to solve big data problems.
To study the impact of big data analysis for societal and business decisions.

Content

Unit 1

Introduction:
- Overviews of Big Data
- State of the Practice in Analytics
- The Data Scientist
- Big Data Analytics in Industry Verticals
- Data Analytics Lifecycle Challenges of Conventional Systems
- Statistical Concepts:
  - Sampling Distributions
  - Re-Sampling, Statistical Inference
  - Prediction Error
  - Regression Modelling
  - Multivariate Analysis
  - Bayesian Modelling

Unit 2

Mining Data Streams:
- Stream Data Model and Architecture
- Stream Computing
- Sampling Data in a Stream
- Filtering Streams
- Counting Distinct Elements in a Stream
- Estimating Moments
- Counting Oneness in a Window
- Decaying Window
- Real time Analytics
- Platform (RTAP) Applications
- Case Studies
- Real Time Sentiment Analysis
- Stock Market Prediction.

Unit 3

Frequent Itemset and Clustering:
- Mining Frequent Itemsets
- Market Based Model:
  - Apriori Algorithm
  - Handling Large Data Sets in Main Memory
  - Limited Pass Algorithm,
  - Counting Frequent Itemsets in a Stream
- Clustering based Techniques:
  - Hierarchical
  - K-Means etc.
- Clustering High Dimensional Data
- CLIQUE And PROCLUS
- Frequent Pattern based Clustering Methods
- Clustering in Non-Euclidean Space
- Clustering for Streams and Parallelism.

Unit 4

Frameworks and Visualization:
- Overview of MapReduce
- Hadoop
- Hive
- MapR
- Sharding
- NoSQL Databases
- S3
- HADOOP
- Distributed File System (HDFS)
- Visualizations:
  - Visual Data Analysis Techniques
  - Interaction Technique and Applications.

Reference Books

Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer, 2007.
A. Rajaraman, J.D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2012.
Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, John Wiley & sons, 2012.
Glenn J. Myatt, Making Sense of Data, John Wiley & Sons, 2007
Pete Warden, Big Data Glossary, O’Reilly, 2011.
J. Han, M. Kamber , Data Mining Concepts and Techniques, 2nd Edition, Elsevier, Reprinted 2008.

Outcomes

To become proficient in recognizing challenges faced by applications dealing with very large data as well as in proposing scalable solutions.
To design efficient algorithms for mining the data databases.
To model a framework for visualization of big data analytics for business user.
To understand the significance of Big Data Analysis in business intelligence, scientific discovery, and day-to-day life.