Skip to content

Latest commit

 

History

History
108 lines (89 loc) · 3.19 KB

CSPC40.md

File metadata and controls

108 lines (89 loc) · 3.19 KB
code title similar needs_screening specifics prereq kind
CSPC40
Big Data Analytics
ITPC40
true
branch semester credits
CS
8
3
0
2
4
CSPC21
PC1

Objectives

  • To understand the concept and challenge of big data and why traditional technology is inadequate to analyze the big data.
  • To collect, manage, store, query, and analyze various types of big data.
  • Gain hands-on experience on large-scale analytics tools to solve big data problems.
  • To study the impact of big data analysis for societal and business decisions.

Content

Unit 1

  1. Introduction:
    • Overviews of Big Data
    • State of the Practice in Analytics
    • The Data Scientist
    • Big Data Analytics in Industry Verticals
    • Data Analytics Lifecycle Challenges of Conventional Systems
    • Statistical Concepts:
      • Sampling Distributions
      • Re-Sampling, Statistical Inference
      • Prediction Error
      • Regression Modelling
      • Multivariate Analysis
      • Bayesian Modelling

Unit 2

  1. Mining Data Streams:
    • Stream Data Model and Architecture
    • Stream Computing
    • Sampling Data in a Stream
    • Filtering Streams
    • Counting Distinct Elements in a Stream
    • Estimating Moments
    • Counting Oneness in a Window
    • Decaying Window
    • Real time Analytics
    • Platform (RTAP) Applications
    • Case Studies
    • Real Time Sentiment Analysis
    • Stock Market Prediction.

Unit 3

  1. Frequent Itemset and Clustering:
    • Mining Frequent Itemsets
    • Market Based Model:
      • Apriori Algorithm
      • Handling Large Data Sets in Main Memory
      • Limited Pass Algorithm,
      • Counting Frequent Itemsets in a Stream
    • Clustering based Techniques:
      • Hierarchical
      • K-Means etc.
    • Clustering High Dimensional Data
    • CLIQUE And PROCLUS
    • Frequent Pattern based Clustering Methods
    • Clustering in Non-Euclidean Space
    • Clustering for Streams and Parallelism.

Unit 4

  1. Frameworks and Visualization:
    • Overview of MapReduce
    • Hadoop
    • Hive
    • MapR
    • Sharding
    • NoSQL Databases
    • S3
    • HADOOP
    • Distributed File System (HDFS)
    • Visualizations:
      • Visual Data Analysis Techniques
      • Interaction Technique and Applications.

Reference Books

  • Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer, 2007.
  • A. Rajaraman, J.D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2012.
  • Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, John Wiley & sons, 2012.
  • Glenn J. Myatt, Making Sense of Data, John Wiley & Sons, 2007
  • Pete Warden, Big Data Glossary, O’Reilly, 2011.
  • J. Han, M. Kamber , Data Mining Concepts and Techniques, 2nd Edition, Elsevier, Reprinted 2008.

Outcomes

  • To become proficient in recognizing challenges faced by applications dealing with very large data as well as in proposing scalable solutions.
  • To design efficient algorithms for mining the data databases.
  • To model a framework for visualization of big data analytics for business user.
  • To understand the significance of Big Data Analysis in business intelligence, scientific discovery, and day-to-day life.