User Tools

Site Tools


b:data_science_from_scratch

Table of Contents

book info: http://shop.oreilly.com/product/0636920033400.do
owned book

Data Science from Scratch

Preface

Data Science

From Scratch

Conventions Used in This Book

Using Code Examples

Safari® Books Online

How to Contact Us

Acknowledgments

1. Introduction

The Ascendance of Data

What Is Data Science?

Motivating Hypothetical: DataSciencester

Finding Key Connectors

Data Scientists You May Know

Salaries and Experience

Topics of Interest

Onward

2. A Crash Course in Python

The Basics

Getting Python

The Zen of Python

Whitespace Formatting

Modules

Arithmetic

Functions

Strings

Exceptions

Lists

Tuples

Dictionaries

defaultdict

Counter

Sets

Control Flow

Truthiness

The Not-So-Basics

Sorting

List Comprehensions

Generators and Iterators

Randomness

Regular Expressions

Object-Oriented Programming

Functional Tools

enumerate

zip and Argument Unpacking

args and kwargs

Welcome to DataSciencester!

For Further Exploration

3. Visualizing Data

matplotlib

Bar Charts

Line Charts

Scatterplots

For Further Exploration

4. Linear Algebra

Vectors

Matrices

For Further Exploration

5. Statistics

Describing a Single Set of Data

Central Tendencies

Dispersion

Correlation

Simpson's Paradox

Some Other Correlational Caveats

Correlation and Causation

For Further Exploration

6. Probability

Dependence and Independence

Conditional Probability

Bayes’s Theorem

Random Variables

Continuous Distributions

The Normal Distribution

The Central Limit Theorem

For Further Exploration

7. Hypothesis and Inference

Statistical Hypothesis Testing

Example: Flipping a Coin

Confidence Intervals

P-hacking

Example: Running an A/B Test

Bayesian Inference

For Further Exploration

8. Gradient Descent

The Idea Behind Gradient Descent

Estimating the Gradient

Using the Gradient

Choosing the Right Step Size

Putting It All Together

Stochastic Gradient Descent

For Further Exploration

9. Getting Data

stdin and stdout

Reading Files

The Basics of Text Files

Delimited Files

Scraping the Web

HTML and the Parsing Thereof

Example: O’Reilly Books About Data

Using APIs

JSON (and XML)

Using an Unauthenticated API

Finding APIs

Example: Using the Twitter APIs

Getting Credentials

Using Twython

For Further Exploration

10. Working with Data

Exploring Your Data

Exploring One-Dimensional Data

Two Dimensions

Many Dimensions

Cleaning and Munging

Manipulating Data

Rescaling

Dimensionality Reduction

For Further Exploration

11. Machine Learning

Modeling

What Is Machine Learning?

Overfitting and Underfitting

Correctness

The Bias-Variance Trade-off

Feature Extraction and Selection

For Further Exploration

12. k-Nearest Neighbors

The Model

Example: Favorite Languages

The Curse of Dimensionality

For Further Exploration

13. Naive Bayes

A Really Dumb Spam Filter

A More Sophisticated Spam Filter

Implementation

Testing Our Model

For Further Exploration

14. Simple Linear Regression

The Model

Using Gradient Descent

Maximum Likelihood Estimation

For Further Exploration

15. Multiple Regression

The Model

Further Assumptions of the Least Squares Model

Fitting the Model

Interpreting the Model

Goodness of Fit

Digression: The Bootstrap

Standard Errors of Regression Coefficients

Regularization

For Further Exploration

16. Logistic Regression

The Problem

The Logistic Function

Applying the Model

Goodness of Fit

Support Vector Machines

For Further Investigation

17. Decision Trees

What Is a Decision Tree?

Entropy

The Entropy of a Partition

Creating a Decision Tree

Putting It All Together

Random Forests

For Further Exploration

18. Neural Networks

Perceptrons

Feed-Forward Neural Networks

Backpropagation

Example: Defeating a CAPTCHA

For Further Exploration

19. Clustering

The Idea

The Model

Example: Meetups

Choosing k

Example: Clustering Colors

Bottom-up Hierarchical Clustering

For Further Exploration

20. Natural Language Processing

Word Clouds

n-gram Models

Grammars

An Aside: Gibbs Sampling

Topic Modeling

For Further Exploration

21. Network Analysis

Betweenness Centrality

Eigenvector Centrality

Matrix Multiplication

Centrality

Directed Graphs and PageRank

For Further Exploration

22. Recommender Systems

Manual Curation

User-Based Collaborative Filtering

Item-Based Collaborative Filtering

For Further Exploration

23. Databases and SQL

CREATE TABLE and INSERT

UPDATE

DELETE

SELECT

GROUP BY

ORDER BY

JOIN

Subqueries

Indexes

Query Optimization

NoSQL

For Further Exploration

24. MapReduce

Example: Word Count

Why MapReduce?

MapReduce More Generally

Example: Analyzing Status Updates

Example: Matrix Multiplication

An Aside: Combiners

For Further Exploration

25. Go Forth and Do Data Science

IPython

Mathematics

Not from Scratch

NumPy

pandas

scikit-learn

Visualization

R

Find Data

Do Data Science

Hacker News

Fire Trucks

T-shirts

And You?

Index

b/data_science_from_scratch.txt · Last modified: 2018/02/02 02:29 by hkimscil