Chapter 1. Introduction to Data Science
No of pages: 10
This chapter introduces the reader to data science, and describes the major stages of working with data (collect, explore, preprocess, visualize, predict, and infer knowledge). It sets the common expectations what constitutes a data science domain. This chapter will elaborate about Anaconda IDE, which will be used in the book.
Chapter 2. Data Acquisition
No of pages: 40
This chapter will introduce a reader how to retrieve and store data from/to various data sources: text files (including various formats like CSV, XML and JSON), binary files (including Apache Avro), Web accessible data, relational databases, NoSQL databases, Apache Arrow (as efficient and novel columnar data storage system), multi-modal databases, and network databases. This chapter will also introduce BeautifulSoup to work with XML and HTML.
Chapter 3. Basic Data Processing
No of pages: 40
These are standard Python libraries for scientific computing and processing data. NumPy encompasses all sorts of data structures required during data analysis. Here, we will provide examples that will illuminate the importance of sophisticated frameworks, and reuse based software engineering in the realm of data science.
Chapter 4. Documenting Work
No of pages: 20
This chapter introduces the most popular computing environment for data analysis. It makes sharing of results between data scientist possible in an easily reproducible manner.
Chapter 5. Transformation and Packaging of Data
No of pages: 30
This chapter illuminates a critical data science framework that is built upon NumPy. It provides excellent data structures for handling data frames and series.
Chapter 6. Visualization
No of pages: 40
This chapter introduces various ways to visualize data; summary statistics or tabular representations are of limited value in exploring data. The following frameworks will the topic of this chapter: matplotlib, glueviz, Bokeh, and orange3. Visualization is important both while doing exploratory analysis as well as when generating effective reports.
Chapter 7. Prediction and Inference
No of pages: 50
This chapter will talk about all techniques and technologies to properly scale data science efforts. It will teach readers how to create systems, that may formulate answers on unseen data, or find hidden patterns in data. It will elaborate about supervised, unsupervised, deep, and reinforcement learning methods. Moreover, it will introduce Apache Spark with MLib (both in batch and stream modes) as well as TensorFlow. The following frameworks will also be the topic of this chapter: XGBoost, sci-kit learn and Keras with PyTorch.
Chapter 8. Network Analysis
No of pages: 40
This chapter explores the ways to analyze complex networks and graphs. This chapter will introduce Apache Spark GraphX, Apache Giraph, and NetworkX. This chapter will also introduce spectral graph analysis, which is an interesting approximate, non-linear, and non-parametric machine learning method.
Chapter 9. Data Science Process Engineering
No of pages: 20
This chapter will elaborate how to share and customize data science practices/methods used by teams via OMG Essence.
Chapter 10. Multi-agent Systems, Game Theory and Machine Learning
Number of pages: 30
This chapter explores advanced data-oriented applications, where data are produced and consumed by self-governed intelligent agents. The chapter introduces the reader to the concept of multi-agent systems, game theoretic methods and models as well as associated learning algorithms.
Chapter 11. Probabilistic Graphical Models
Number of pages: 30
This chapter explains the most sophisticated form of a graph structure to model many advanced data science problems. Nodes in the graph denote random variables, while the links represent relations between those variables. This chapter equips the reader with a method that may be used when simpler solutions aren’t satisfactory.
Chapter 12. Security in Data Science
Number of pages: 20
This chapter presents techniques to anonymize data, and to deal with situations when learning methods must cope with adversarial modifications (a.k.a. adversarial machine learning). This chapter also talks about ways to protect data both in transit and in rest.
Appendix A - Crash Course in Python 3
No of pages: 20
This chapter will briefly teach readers about Python 3, and explain why Python 3 is a perfect choice for doing data science.