b:doing_a_data_science

http://shop.oreilly.com/product/0636920028529.do

1449358659?tag=splitbrain-20

Doing Data Science

Table of Contents

I was working on the Google+ data science team with an interdisciplinary team of PhDs. There was me (a statistician), a social scientist, an engineer, a physicist, and a computer scientist. We were part of a larger team that included talented data engineers who built the data pipelines, infrastructure, and dashboards, as well as built the experimental infrastructure (A/B testing). Our team had a flat structure. Together our skills were powerful, and we were able to do amazing things with massive datasets, including predictive modeling, prototyping algorithms, and unearthing patterns in the data that had huge impact on the product. – Doing a data science

Supplemental Readings

*Math*

- Linear Algebra and Its Applications by Gilbert Strang (Cengage Learning)
- Convex Optimization by Stephen Boyd and Lieven Vendenberghe (Cambridge University Press)
- A First Course in Probability (Pearson) and Introduction to Probability Models (Academic Press) by Sheldon Ross

*Coding*

- R in a Nutshell by Joseph Adler (O’Reilly)
- Learning Python by Mark Lutz and David Ascher (O’Reilly)
- R for Everyone: Advanced Analytics and Graphics by Jared Lander (Addison-Wesley)
- The Art of R Programming: A Tour of Statistical Software Design by Norman Matloff (No Starch Press)
- Python for Data Analysis by Wes McKinney (O’Reilly)

*Data Analysis and Statistical Inference*

- Statistical Inference by George Casella and Roger L. Berger (Cengage Learning)
- Bayesian Data Analysis by Andrew Gelman, et al. (Chapman & Hall)
- Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman and Jennifer Hill (Cambridge University Press)
- Advanced Data Analysis from an Elementary Point of View by Cosma Shalizi (under contract with Cambridge University Press)
- The Elements of Statistical Learning: Data Mining, Inference and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (Springer)

*Artificial Intelligence and Machine Learning*

- Pattern Recognition and Machine Learning by Christopher Bishop (Springer)
- Bayesian Reasoning and Machine Learning by David Barber (Cambridge University Press)
- Programming Collective Intelligence by Toby Segaran (O’Reilly)
- Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig (Prentice Hall)
- Foundations of Machine Learning by Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar (MIT Press)
- Introduction to Machine Learning (Adaptive Computation and Machine Learning) by Ethem Alpaydim (MIT Press)

*Experimental Design*

- Field Experiments by Alan S. Gerber and Donald P. Green (Norton)
- Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, et al. (Wiley-Interscience)

*Visualization*

- The Elements of Graphing Data by William Cleveland (Hobart Press)
- Visualize This: The FlowingData Guide to Design, Visualization, and Statistics by Nathan Yau (Wiley)
- The Visual Display of Quantitative Information by Edward Tufte (Graphics Press)

I was working on the Google+ data science team with an interdisciplinary team of PhDs. There was me (a statistician), a social scientist, an engineer, a physicist, and a computer scientist. We were part of a larger team that included talented data engineers who built the data pipelines, infrastructure, and dashboards, as well as built the experimental infrastructure (A/B testing). Our team had a flat structure. Together our skills were powerful, and we were able to do amazing things with massive datasets, including predictive modeling, prototyping algorithms, and unearthing patterns in the data that had huge impact on the product. – Doing a data science

Supplemental Readings

*Math*

- Linear Algebra and Its Applications by Gilbert Strang (Cengage Learning)
- Convex Optimization by Stephen Boyd and Lieven Vendenberghe (Cambridge University Press)
- A First Course in Probability (Pearson) and Introduction to Probability Models (Academic Press) by Sheldon Ross

*Coding*

- R in a Nutshell by Joseph Adler (O’Reilly)
- Learning Python by Mark Lutz and David Ascher (O’Reilly)
- R for Everyone: Advanced Analytics and Graphics by Jared Lander (Addison-Wesley)
- The Art of R Programming: A Tour of Statistical Software Design by Norman Matloff (No Starch Press)
- Python for Data Analysis by Wes McKinney (O’Reilly)

*Data Analysis and Statistical Inference*

- Statistical Inference by George Casella and Roger L. Berger (Cengage Learning)
- Bayesian Data Analysis by Andrew Gelman, et al. (Chapman & Hall)
- Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman and Jennifer Hill (Cambridge University Press)
- Advanced Data Analysis from an Elementary Point of View by Cosma Shalizi (under contract with Cambridge University Press)
- The Elements of Statistical Learning: Data Mining, Inference and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (Springer)

*Artificial Intelligence and Machine Learning*

- Pattern Recognition and Machine Learning by Christopher Bishop (Springer)
- Bayesian Reasoning and Machine Learning by David Barber (Cambridge University Press)
- Programming Collective Intelligence by Toby Segaran (O’Reilly)
- Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig (Prentice Hall)
- Foundations of Machine Learning by Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar (MIT Press)
- Introduction to Machine Learning (Adaptive Computation and Machine Learning) by Ethem Alpaydim (MIT Press)

*Experimental Design*

- Field Experiments by Alan S. Gerber and Donald P. Green (Norton)
- Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, et al. (Wiley-Interscience)

*Visualization*

- The Elements of Graphing Data by William Cleveland (Hobart Press)
- Visualize This: The FlowingData Guide to Design, Visualization, and Statistics by Nathan Yau (Wiley)
- The Visual Display of Quantitative Information by Edward Tufte (Graphics Press)

b/doing_a_data_science.txt · Last modified: 2018/02/02 02:30 by hkimscil