Terminologies Used in Big Data Environments – Types of Digital Data – Classification of Digital Data – Introduction to Big Data – Characteristics of Data – Evolution of Big Data – Big Data Analytics – Classification of Analytics – Top Challenges Facing Big Data – Importance of Big Data Analytics – Data Analytics Tools
Numpy: Numpy Data types, Scipy, Jupyter, Statsmodels and Pandas Package – Scikit learn, R programming .
Introducing Hadoop – Hadoop Overview – RDBMS versus Hadoop – HDFS (Hadoop Distributed File System): Components and Block Replication – Processing Data with Hadoop – Introduction to MapReduce – Features of MapReduce, YARN, HBASE
Data Munging: Introduction to Data Munging, Data Pipeline and Machine Learning in Python – Data Visualization Using Matplotlib – Interactive Visualization with Advanced Data Learning Representation in Python
Introduction to NoSQL: Types of NoSQL Databases-Key-value store, Document store, Column family, Graph store, CAP theorem – CAP Theorem NoSQL databases, MongoDB: RDBMS VsMongoDB – Mongo DB Database Model – Data Types, Sharding –Types of sharding, Introduction to Hive – Hive Architecture – Hive Query Language (HQL)
Reference Book:
Alberto Boschetti, Luca Massaron, “Python Data Science Essentialsâ€, Packt Publications, 2nd Edition, 2016 VDT Editorial Services, Big Data, Black Book, Dream Tech Press, 2015 Yuxi (Hayden) Liu, “Python Machine Learningâ€, Packt Publication, 2017.
Text Book:
Frank Pane, “Hands On Data Science and Python Machine Learningâ€, Packt Publishers, 2017 Seema Acharya, SubhashiniChellapan, “Big Data and Analyticsâ€, Wiley, 2015.