Practical Data Science with Python 3, 1st ed. Synthesizing Actionable Insights from Data

Langue : Anglais

Auteur : Varga Ervin

Couverture de l’ouvrage Practical Data Science with Python 3

Résumé
Sommaire
Biographie
Commentaire

Gain insight into essential data science skills in a holistic manner using data engineering and associated scalable computational methods. This book covers the most popular Python 3 frameworks for both local and distributed (in premise and cloud based) processing. Along the way, you will be introduced to many popular open-source frameworks, like, SciPy, scikitlearn, Numba, Apache Spark, etc. The book is structured around examples, so you will grasp core concepts via case studies and Python 3 code.

As data science projects gets continuously larger and more complex, software engineering knowledge and experience is crucial to produce evolvable solutions. You'll see how to create maintainable software for data science and how to document data engineering practices.

This book is a good starting point for people who want to gain practical skills to perform data science. All the code will be available in the form of IPython notebooks and Python 3 programs, which allow you to reproduce all analyses from the book and customize them for your own purpose. You'll also benefit from advanced topics like Machine Learning, Recommender Systems, and Security in Data Science.

Practical Data Science with Python will empower you analyze data, formulate proper questions, and produce actionable insights, three core stages in most data science endeavors.

What You'll Learn

Play the role of a data scientist when completing increasingly challenging exercises using Python 3
Work work with proven data science techniques/technologies
Review scalable software engineering practices to ramp up data analysis abilities in the realm of Big Data
Apply theory of probability, statistical inference, and algebra to understand the data science practices

Who This Book Is For

Anyone who would like to embark into the realm of data science using Python 3.

Chapter 1. Introduction to Data Science

No of pages: 10

This chapter introduces the reader to data science, and describes the major stages of working with data (collect, explore, preprocess, visualize, predict, and infer knowledge). It sets the common expectations what constitutes a data science domain. This chapter will elaborate about Anaconda IDE, which will be used in the book.

Chapter 2. Data Acquisition

No of pages: 40

This chapter will introduce a reader how to retrieve and store data from/to various data sources: text files (including various formats like CSV, XML and JSON), binary files (including Apache Avro), Web accessible data, relational databases, NoSQL databases, Apache Arrow (as efficient and novel columnar data storage system), multi-modal databases, and network databases. This chapter will also introduce BeautifulSoup to work with XML and HTML.

Chapter 3. Basic Data Processing

No of pages: 40

These are standard Python libraries for scientific computing and processing data. NumPy encompasses all sorts of data structures required during data analysis. Here, we will provide examples that will illuminate the importance of sophisticated frameworks, and reuse based software engineering in the realm of data science.

Chapter 4. Documenting Work

No of pages: 20

This chapter introduces the most popular computing environment for data analysis. It makes sharing of results between data scientist possible in an easily reproducible manner.

Chapter 5. Transformation and Packaging of Data

No of pages: 30

This chapter illuminates a critical data science framework that is built upon NumPy. It provides excellent data structures for handling data frames and series.

Chapter 6. Visualization

No of pages: 40

This chapter introduces various ways to visualize data; summary statistics or tabular representations are of limited value in exploring data. The following frameworks will the topic of this chapter: matplotlib, glueviz, Bokeh, and orange3. Visualization is important both while doing exploratory analysis as well as when generating effective reports.

Chapter 7. Prediction and Inference

No of pages: 50

This chapter will talk about all techniques and technologies to properly scale data science efforts. It will teach readers how to create systems, that may formulate answers on unseen data, or find hidden patterns in data. It will elaborate about supervised, unsupervised, deep, and reinforcement learning methods. Moreover, it will introduce Apache Spark with MLib (both in batch and stream modes) as well as TensorFlow. The following frameworks will also be the topic of this chapter: XGBoost, sci-kit learn and Keras with PyTorch.

Chapter 8. Network Analysis

No of pages: 40

This chapter explores the ways to analyze complex networks and graphs. This chapter will introduce Apache Spark GraphX, Apache Giraph, and NetworkX. This chapter will also introduce spectral graph analysis, which is an interesting approximate, non-linear, and non-parametric machine learning method.

Chapter 9. Data Science Process Engineering

No of pages: 20

This chapter will elaborate how to share and customize data science practices/methods used by teams via OMG Essence.

Chapter 10. Multi-agent Systems, Game Theory and Machine Learning

Number of pages: 30

This chapter explores advanced data-oriented applications, where data are produced and consumed by self-governed intelligent agents. The chapter introduces the reader to the concept of multi-agent systems, game theoretic methods and models as well as associated learning algorithms.

Chapter 11. Probabilistic Graphical Models

Number of pages: 30

This chapter explains the most sophisticated form of a graph structure to model many advanced data science problems. Nodes in the graph denote random variables, while the links represent relations between those variables. This chapter equips the reader with a method that may be used when simpler solutions aren’t satisfactory.

Chapter 12. Security in Data Science

Number of pages: 20

This chapter presents techniques to anonymize data, and to deal with situations when learning methods must cope with adversarial modifications (a.k.a. adversarial machine learning). This chapter also talks about ways to protect data both in transit and in rest.

Appendix A - Crash Course in Python 3

No of pages: 20

This chapter will briefly teach readers about Python 3, and explain why Python 3 is a perfect choice for doing data science.

Ervin Varga is a Senior Member of IEEE and Professional Member of ACM. He is an IEEE Software Engineering Certified Instructor. Ervin is an owner of the software consulting company Expro I.T. Consulting, Serbia. He has an MSc in computer science, and a PhD in electrical engineering (his thesis was an application of software engineering and computer science in the domain of electrical power systems). Ervin is also a technical advisor of the open-source project Mainflux.

Provides a mechanism to solidify data science related topics in a unified fashion, while treating theory and practice as equally important

Uses publicly available real life data-sets, that cannot be tackled without hinging on advanced data science methods and tools

Focuses on knowledge synthesis; how things come together in data science, and more importantly why

Broché

Date de parution : 09-2019

Ouvrage de 462 p.

15.5x23.5 cm

Disponible chez l'éditeur (délai d'approvisionnement : 15 jours).

52,74 €

Ajouter au panier

Thèmes de Practical Data Science with Python 3 :

Mots-clés :

Data Science; Python 3; Machine Learning; Neural Networks; OMG Essence; Apache Spark; TensorFlow; Numpy; Pandas; Matpotlib; IPython notebooks

Practical Data Science with Python 3, 1st ed. Synthesizing Actionable Insights from Data

Auteur : Varga Ervin

Résumé

Sommaire

Biographie

Commentaire

Thèmes de Practical Data Science with Python 3 :

Mots-clés :

Ces ouvrages sont susceptibles de vous intéresser