Book Description | Book description Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If youre familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format. With this book, youll learn: Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that learn from data Unsupervised learning methods for extracting meaning from unlabeled data Show and hide more |
About the Author | Peter Bruce is the Founder and Chief Academic Officer of the Institute for Statistics Education at Statistics.com, which offers about 80 courses in statistics and analytics, roughly half of which are aimed at data scientists. He has authored or co-authored several books in statistics and analytics, and he earned his Bachelors degree at Princeton, and Masters degrees at Harvard and the University of Maryland.^Andrew Bruce, Principal Research Scientist at Amazon, has over 30 years of experience in statistics and data science in academia, government and business. The co-author of Applied Wavelet Analysis with S-PLUS, he earned his bachelors degree at Princeton, and PhD in statistics at the University of Washington^Peter Gedeck, Senior Data Scientist at Collaborative Drug Discovery, specializes in the development of machine learning algorithms to predict biological and physicochemical properties of drug candidates. Co-author of Data Mining for Business Analytics, he earned PhDs in Chemistry from the University of Erlangen-Nrnberg in Germany and Mathematics from Fernuniversitt Hagen, Germany. |