User Tools

Site Tools


c:ms:2020:schedule

R Cookbook
Chapter 1 Getting Started and Getting Help
Chapter 2 Some Basics
Chapter 3 Navigating the Software
Chapter 4 Input and Output
Chapter 5 Data Structures
Chapter 6 Data Transformations
Chapter 7 Strings and Dates
Chapter 8 Probability
Chapter 9 General Statistics
Chapter 10 Graphics
Chapter 11 Linear Regression and ANOVA
Chapter 12 Useful Tricks
Chapter 13 Beyond Basic Numerics and Statistics
Chapter 14 Time Series Analysis

이전 페이지

Week01 (Mar 16, 19)

ideas and concepts

https://youtu.be/6ExajWI_r2w
https://youtu.be/J8e5dEH8K_Q
https://youtu.be/W3DhUXI5cyQ
https://youtu.be/qCeTcvWBDNY
https://youtu.be/1hJm0O-RY4Q

Course Introduction –> syllabus

Introduction to R and others

  1. Downloading and Installing R
  2. Starting R
  3. Entering Commands
  4. Exiting from R
  5. Interrupting R
  6. Viewing the Supplied Documentation
  7. Getting Help on a Function
  8. Searching the Supplied Documentation
  9. Getting Help on a Package
  10. Searching the Web for Help
  11. Finding Relevant Functions and Packages
  12. Searching the Mailing Lists
  13. Submitting Questions to the Mailing Lists

기본용어
기술통계 (descriptive statistics)
추론통계 (inferential statistics)
아래의 개념은 샘플링 문서를 먼저 볼것

  • sampling methods
    • probability
    • non-probability

가설 (hypothesis)

  • 차이와 연관 (difference and association)

변인 (variables)

Assignment

Week02 (Mar. 23, 26)

Concepts and ideas

Some basics

  1. Introduction
  2. Printing Something
  3. Setting Variables
  4. Listing Variables
  5. Deleting Variables
  6. Creating a Vector
  7. Computing Basic Statistics
  8. Creating Sequences
  9. Comparing Vectors
  10. Selecting Vector Elements
  11. Performing Vector Arithmetic
  12. Getting Operator Precedence Right
  13. Defining a Function
  14. Typing Less and Accomplishing More
  15. Avoiding Some Common Mistakes

from the previous lecture (research question and hypothesis)

Assignment

Week03 (Mar 30, April 2)

3주차 온라인 강의 동영상

이후 Howell, Ch. 4 내용 중 Variance와 (분산) Standard deviation은 (표준편차는) 이후 통계 검증방법을 이해하는데 기초가 되는 중요한 내용이니 꼭 숙지하시기 바랍니다.

Concepts and ideas

Navigating software

  1. Introduction
  2. Getting and Setting the Working Directory
  3. Saving Your Workspace
  4. Viewing Your Command History
  5. Saving the Result of the Previous Command
  6. Displaying the Search Path
  7. Accessing the Functions in a Package
  8. Accessing Built-in Datasets
  9. Viewing the List of Installed Packages
  10. Installing Packages from CRAN
  11. Setting a Default CRAN Mirror
  12. Suppressing the Startup Message
  13. Running a Script
  14. Running a Batch Script
  15. Getting and Setting Environment Variables
  16. Locating the R Home Directory
  17. Customizing R

Mean
Mode
Median
Variance
Standard Deviation

+-1 sd = 68% = +-1 sd
+-2 sd = 95% = +-1.96 sd
+-3 sd = 99% (99.7%) = +-3 sd

표준점수 (unit with a standard deviation) = z score

Sampling distribution via random sampling
Central Limit Theorem

Assignment

Find two research articles that have listed hypotheses (social science research article would be good option). For each article:

  1. 각 가설을 적고
  2. 독립변인과 종속변인 그리고 intervening (moderator) 변인 등이 무엇인지 설명하시오.
  3. 각 변인이 어떻게 측정되었는지 설명하시오.
  4. 각 가설이 어떤 종류인지 설명하시오. (차이, 연관의 가설)
  5. 가설검증을 위해서 어떤 테스트방법을 취했는지 찾아서 기록하시오.

due date: 다음 주 수요일 자정까지 완성하시오 (2018/09/26 11:59).

Week04 (April 6, 9)

Class Activity

Lecture materials for this week

Concepts and ideas

Input and output

  1. Introduction
  2. Entering Data from the Keyboard
  3. Printing Fewer Digits (or More Digits)
  4. Redirecting Output to a File
  5. Listing Files
  6. Dealing with “Cannot Open File” in Windows
  7. Reading Fixed-Width Records
  8. Reading Tabular Data Files
  9. Reading from CSV Files
  10. Writing to CSV Files
  11. Reading Tabular or CSV Data from the Web
  12. Reading Data from HTML Tables
  13. Reading Files with a Complex Structure
  14. Reading from MySQL Databases
  15. Saving and Transporting Objects

Assignment

Week05 (April 13, 16)

Concepts and ideas

Data Structures

  1. Introduction
  2. Appending Data to a Vector
  3. Inserting Data into a Vector
  4. Understanding the Recycling Rule
  5. Creating a Factor (Categorical Variable)
  6. Combining Multiple Vectors into One Vector and a Factor
  7. Creating a List
  8. Selecting List Elements by Position
  9. Selecting List Elements by Name
  10. Building a Name/Value Association List
  11. Removing an Element from a List
  12. Flatten a List into a Vector
  13. Removing NULL Elements from a List
  14. Removing List Elements Using a Condition
  15. Initializing a Matrix
  16. Performing Matrix Operations
  17. Giving Descriptive Names to the Rows and Columns of a Matrix
  18. Selecting One Row or Column from a Matrix
  19. Initializing a Data Frame from Column Data
  20. Initializing a Data Frame from Row Data
  21. Appending Rows to a Data Frame
  22. Preallocating a Data Frame
  23. Selecting Data Frame Columns by Position
  24. Selecting Data Frame Columns by Name
  25. Selecting Rows and Columns More Easily
  26. Changing the Names of Data Frame Columns
  27. Editing a Data Frame
  28. Removing NAs from a Data Frame
  29. Excluding Columns by Name
  30. Combining Two Data Frames
  31. Merging Data Frames by Common Column
  32. Accessing Data Frame Contents More Easily
  33. Converting One Atomic Value into Another
  34. Converting One Structured Data Type into Another

Assignment

조원들과 협력하여

  • 선행연구조사와 가설이 수록된 사회과학 논문을 찾습니다
    • dbpia, kyobo scholar를 이용하세요
  • 선행연구조사에 수록된 내용을 요약합니다.
  • 가설을 소개합니다.
    • 각 가설의 독립변인과 종속변인 혹은 그 외의 변인종류를 밝힙니다
    • 각 변인이 어떻게 측정되었는지 그 측정수준을 밝힙니다
  • 논문을 하나 찾기 전에 조원들과 함께 조원들의 학문적 관심사에 대한 통일을 하여 재미있는 논문을 찾기를 권합니다. 가령 내가 디자인에 관심이 많은 학생이라면 UI와 관련된 논문에 더 관심이 갈 것입니다. 거기에 더하여 요사이 자율주행 자동차 (혹은 그냥 자동차) UI에 대한 논문이 사회과학에서 있어서 읽을 수 있다면 흥미로울 것입니다 (그런데 없을 것 같은 생각이 . . . )
  • 마감일은 다음 주 화요일 자정까지 입니다.
  • 조원미팅은 카톡방이나 그 외의 테크놀로지를 이용하여 하시는 걸 권합니다.

Week06 (April 20, 23)

오늘 할 일 (실시간 온라인 미팅)

  • 그룹확인
  • 다음 주 퀴즈 공지
  • 그룹과제 설명
  • 그룹미팅

Concepts and ideas

Data Transformations

  1. Introduction
  2. Splitting a Vector into Groups
  3. Applying a Function to Each List Element
  4. Applying a Function to Every Row
  5. Applying a Function to Every Column
  6. Applying a Function to Groups of Data
  7. Applying a Function to Groups of Rows
  8. Applying a Function to Parallel Vectors or Lists

Strings and Dates

Announcement

  • First quiz on Week 07, Tuesday class (Oct. 16)

Assignment

Week07 (April 27, 30)

Concepts and ideas

과제 리뷰 –> groups

Hypothesis testing
z-test

  • r 에서 qnorm(proportion) pnorm(z-score) function 이해 필요
  • z_score 참조

types of error
t-test

  • r 에서, qt(proportion, df), pt(t-score, df) function 이해 필요
  • probability 참조

Probability calculation in R ← Probability in R cookbook (텍스트북)

. . . .
ANOVA
factorial anova
correlation
regression

Probability

  1. Introduction
  2. Counting the Number of Combinations
  3. Generating Combinations
  4. Generating Random Numbers
  5. Generating Reproducible Random Numbers
  6. Generating a Random Sample
  7. Generating Random Sequences
  8. Randomly Permuting a Vector
  9. Calculating Probabilities for Discrete Distributions
  10. Calculating Probabilities for Continuous Distributions
  11. Converting Probabilities to Quantiles
  12. Plotting a Density Function

Assignment


개인과제

Week08 (May 4, 7)

시험기간
보강영상 수업

Week09 (May 11, 14)

Concepts and ideas

General Statistics
t-test
ANOVA
Factorial ANOVA
repeated measure anova
correlation and regression and multiple regression

  1. Introduction
  2. Summarizing Your Data
  3. Calculating Relative Frequencies
  4. Tabulating Factors and Creating Contingency Tables
  5. Testing Categorical Variables for Independence
  6. Calculating Quantiles (and Quartiles) of a Dataset
  7. Inverting a Quantile
  8. Converting Data to Z-Scores
  9. Testing the Mean of a Sample (t Test)
  10. Forming a Confidence Interval for a Mean
  11. Forming a Confidence Interval for a Median
  12. Testing a Sample Proportion
  13. Forming a Confidence Interval for a Proportion
  14. Testing for Normality
  15. Testing for Runs
  16. Comparing the Means of Two Samples
  17. Comparing the Locations of Two Samples Nonparametrically
  18. Testing a Correlation for Significance
  19. Testing Groups for Equal Proportions
  20. Performing Pairwise Comparisons Between Group Means
  21. Testing Two Samples for the Same Distribution

Assignment

Week10 (May 18, 21)

Concepts and ideas

multiple regression continued.

sequential regression

using dummy variables

Assignment

Week11 (May 25, 28)

Concepts and ideas

getting started
basics
navigating in r
input output in r
data structures
data transformations


Graphics

  1. Introduction
  2. Creating a Scatter Plot
  3. Adding a Title and Labels
  4. Adding a Grid
  5. Creating a Scatter Plot of Multiple Groups
  6. Adding a Legend
  7. Plotting the Regression Line of a Scatter Plot
  8. Plotting All Variables Against All Other Variables
  9. Creating One Scatter Plot for Each Factor Level
  10. Creating a Bar Chart
  11. Adding Confidence Intervals to a Bar Chart
  12. Coloring a Bar Chart
  13. Plotting a Line from x and y Points
  14. Changing the Type, Width, or Color of a Line
  15. Plotting Multiple Datasets
  16. Adding Vertical or Horizontal Lines
  17. Creating a Box Plot
  18. Creating One Box Plot for Each Factor Level
  19. Creating a Histogram
  20. Adding a Density Estimate to a Histogram
  21. Creating a Discrete Histogram
  22. Creating a Normal Quantile-Quantile (Q-Q) Plot
  23. Creating Other Quantile-Quantile Plots
  24. Plotting a Variable in Multiple Colors
  25. Graphing a Function
  26. Pausing Between Plots
  27. Displaying Several Figures on One Page
  28. Opening Additional Graphics Windows
  29. Writing Your Plot to a File
  30. Changing Graphical Parameters

Assignment

Week12 (June 1, 4)

Announcement

Quiz 03: Nov. 23

Concepts and ideas

chi-square test
probability
general statistics

Graphics

Assignment

Week13 (June 8, 11)

Concepts and ideas

Do the following

S1 <- c(89, 85, 85, 86, 88, 89, 86, 82, 96, 85, 93, 91, 
        98, 87, 94, 77, 87, 98, 85, 89, 95, 85, 93, 93, 
        97, 71, 97, 93, 75, 68, 98, 95, 79, 94, 98, 95)
S2 <- c(60, 98, 94, 95, 99, 97, 100, 73, 93, 91, 98, 
        86, 66, 83, 77, 97, 91, 93, 71, 91, 95, 100, 
        72, 96, 91, 76, 100, 97, 99, 95, 97, 77, 94, 
        99, 88, 100, 94, 93, 86)
S3 <- c(95, 86, 90, 90, 75, 83, 96, 85, 83, 84, 81, 98, 
        77, 94, 84, 89, 93, 99, 91, 77, 95, 90, 91, 87, 
        85, 76, 99, 99, 97, 97, 97, 77, 93, 96, 90, 87, 
        97, 88)
S4 <- c(67, 93, 63, 83, 87, 97, 96, 92, 93, 96, 87, 90, 
        94, 90, 82, 91, 85, 93, 83, 90, 87, 99, 94, 88, 
        90, 72, 81, 93, 93, 94, 97, 89, 96, 95, 82, 97)

scores <- list(S1=S1,S2=S2,S3=S3,S4=S4)
  • find means for each element in “scores” in a list format
  • find standard deviation for each element in “scores” in a data frame format
  • find variance for each element in “scores” in a data frame format without using “var” function
longdata<- c(-1.850152, -1.406571, -1.0104817, -3.7170704, 
           -0.2804896, 0.9496313, 1.346517, -0.1580926, 1.6272786, 
           -2.4483321, -0.5407272, -1.708678, -0.3480616, -0.2757667, 
           -1.2177024)
  • make “longdata” to a matrix whose size is 3 by 5
  • name columns “trial1, trial2, . . . . trial5”
  • name rows “subject1, subject2, subject3”
  • get means for each subject
  • attach the above data to the matrix data and name it “longtemp.”
  • get standard deviation for each trial
  • attach the above data to the matrix data, “longtemp.”
suburbs <- read.csv("http://commres.net/wiki/_export/code/r/data_transformations?codeblock=15", head=T, sep="	")
  • get subrubs data as the above
  • get population means by each state (listed in the data, suburbs)
    • use aggregate and refer to the below e.g.
attach(Cars93)
aggregate(MPG.city ~ Origin, Cars93, mean)
  • get population sum by each county with tapply function.
  • tapply(number, byfactor, function)
  • how many counties are there?
  • Use Cars93 data, get MPG.city mean by Origin.

Using pnorm, qnorm
pnorm : get proportion out of normal distribution whose characteristics are mean and sd

pnorm(84, mean=72, sd=15.2, lower.tail=FALSE)
  • What is the value of the below?
pnorm(1)
  • How would you get 68, 95, 99% from pnorm
    • use ?pnorm and see the default option
  • generate 10 random numbers with runif function
year <- c(1900:2016)     # years in vector year
world.series <- data.frame(year)
  • get 10 year samples out of world.series data with “sample” command
  • how would you get the sample sample again latter?
pnorm(110, mean=100, sd=10)
  • What would be the result from the above?
library(MASS)       # load the MASS package 
tbl = table(survey$Smoke, survey$Exer) 
tbl                 # the contingency table
summary(tbl)
  • read the above output and interpret
  • what about the below one?
chisq.test(tbl) 

see first chi-square test
see chi-square test in r document space for more

 library(MASS)
 cardata <- data.frame(Cars93$Origin, Cars93$Type)
 cardata
  • Can you say the types of cars are different by the Origins?
dur <- faithful$eruptions
dur
  • make the above data into z-score (zdur).
  • get mean of the zdur
  • get sd of the zdur
set.seed(1123)
x <- rnorm(50, mean=100, sd=15)
  • test x against population mean 95.
  • test x against population mean 99.
  • are they different from each other?
  • what would you do if you want to see the different result from the second one?
a = c(65, 78, 88, 55, 48, 95, 66, 57, 79, 81)

> t.test(a, mu=60)

	One Sample t-test

data:  a
t = 2.3079, df = 9, p-value = 0.0464
alternative hypothesis: true mean is not equal to 60
95 percent confidence interval:
 60.22187 82.17813
sample estimates:
mean of x 
     71.2 
  • find the t critical value with function qt.
  • explain what happens in the next code
  • read (or remind) what pnorm and qnorm do.
> s <- sd(x)
> m <- mean(x)
> n <- length(x)
> n
[1] 50
> m
[1] 96.00386
> s
[1] 17.38321
> SE <- s / sqrt(n)
> SE
[1] 2.458358
> E <- qt(.975, df=n-1)*SE
> E
[1] 4.940254
> m + c(-E, E)
[1]  91.0636 100.9441
> 
  • what's wrong with the below?
t.test(x)
> mtcars
  • using aggregate, get mean for each trnas. type.
  • compare the difference of mileage between auto and manual cars.
    • use t.test (two sample)
    • “use var.equal=T” option
a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179)
b = c(185, 169, 173, 173, 188, 186, 175, 174, 179, 180)
  • stack them into data c
  • convert colnames into score and trans
  • t.test score by trans with var.equal option true.
  • aov test
  • see t.test t value, t = -0.9474 and F value, F = ?

Assignment

  1. Do Ex 1 part in linear regression

Week14 (June 15, 18)

Week15 (June 22, 25)

Final quiz
Part I (필기시험): NO open book.

Part II (r 실기시험): 교재와 R help만 허용

Week16 (June 22, 25)

Final-term

c/ms/2020/schedule.txt · Last modified: 2020/05/01 21:13 by hkimscil