code | title | similar | needs_screening | specifics | prereq | kind | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CSPC40 |
Big Data Analytics |
|
true |
|
|
PC1 |
- To understand the concept and challenge of big data and why traditional technology is inadequate to analyze the big data.
- To collect, manage, store, query, and analyze various types of big data.
- Gain hands-on experience on large-scale analytics tools to solve big data problems.
- To study the impact of big data analysis for societal and business decisions.
- Introduction:
- Overviews of Big Data
- State of the Practice in Analytics
- The Data Scientist
- Big Data Analytics in Industry Verticals
- Data Analytics Lifecycle Challenges of Conventional Systems
- Statistical Concepts:
- Sampling Distributions
- Re-Sampling, Statistical Inference
- Prediction Error
- Regression Modelling
- Multivariate Analysis
- Bayesian Modelling
- Mining Data Streams:
- Stream Data Model and Architecture
- Stream Computing
- Sampling Data in a Stream
- Filtering Streams
- Counting Distinct Elements in a Stream
- Estimating Moments
- Counting Oneness in a Window
- Decaying Window
- Real time Analytics
- Platform (RTAP) Applications
- Case Studies
- Real Time Sentiment Analysis
- Stock Market Prediction.
- Frequent Itemset and Clustering:
- Mining Frequent Itemsets
- Market Based Model:
- Apriori Algorithm
- Handling Large Data Sets in Main Memory
- Limited Pass Algorithm,
- Counting Frequent Itemsets in a Stream
- Clustering based Techniques:
- Hierarchical
- K-Means etc.
- Clustering High Dimensional Data
- CLIQUE And PROCLUS
- Frequent Pattern based Clustering Methods
- Clustering in Non-Euclidean Space
- Clustering for Streams and Parallelism.
- Frameworks and Visualization:
- Overview of MapReduce
- Hadoop
- Hive
- MapR
- Sharding
- NoSQL Databases
- S3
- HADOOP
- Distributed File System (HDFS)
- Visualizations:
- Visual Data Analysis Techniques
- Interaction Technique and Applications.
- Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer, 2007.
- A. Rajaraman, J.D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2012.
- Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, John Wiley & sons, 2012.
- Glenn J. Myatt, Making Sense of Data, John Wiley & Sons, 2007
- Pete Warden, Big Data Glossary, O’Reilly, 2011.
- J. Han, M. Kamber , Data Mining Concepts and Techniques, 2nd Edition, Elsevier, Reprinted 2008.
- To become proficient in recognizing challenges faced by applications dealing with very large data as well as in proposing scalable solutions.
- To design efficient algorithms for mining the data databases.
- To model a framework for visualization of big data analytics for business user.
- To understand the significance of Big Data Analysis in business intelligence, scientific discovery, and day-to-day life.