c:ms:2020:schedule

R Cookbook

Chapter 1 Getting Started and Getting Help

Chapter 2 Some Basics

Chapter 3 Navigating the Software

Chapter 4 Input and Output

Chapter 5 Data Structures

Chapter 6 Data Transformations

Chapter 7 Strings and Dates

Chapter 8 Probability

Chapter 9 General Statistics

Chapter 10 Graphics

Chapter 11 Linear Regression and ANOVA

Chapter 12 Useful Tricks

Chapter 13 Beyond Basic Numerics and Statistics

Chapter 14 Time Series Analysis

https://youtu.be/6ExajWI_r2w

https://youtu.be/J8e5dEH8K_Q

https://youtu.be/W3DhUXI5cyQ

https://youtu.be/qCeTcvWBDNY

https://youtu.be/1hJm0O-RY4Q

Course Introduction –> syllabus

Introduction to R and others

- Downloading and Installing R
- Starting R
- Entering Commands
- Exiting from R
- Interrupting R
- Viewing the Supplied Documentation
- Getting Help on a Function
- Searching the Supplied Documentation
- Getting Help on a Package
- Searching the Web for Help
- Finding Relevant Functions and Packages
- Searching the Mailing Lists
- Submitting Questions to the Mailing Lists

기본용어

기술통계 (descriptive statistics)

추론통계 (inferential statistics)

아래의 개념은 샘플링 문서를 먼저 볼것

- 전집 (population)
- 표본 (sample)
- 모수치 (parameter)
- 통계치 (statistics)

- sampling methods
- probability
- non-probability

가설 (hypothesis)

- 차이와 연관 (difference and association)

변인 (variables)

Some basics

- Introduction
- Printing Something
- Setting Variables
- Listing Variables
- Deleting Variables
- Creating a Vector
- Computing Basic Statistics
- Creating Sequences
- Comparing Vectors
- Selecting Vector Elements
- Performing Vector Arithmetic
- Getting Operator Precedence Right
- Defining a Function
- Typing Less and Accomplishing More
- Avoiding Some Common Mistakes

from the previous lecture (research question and hypothesis)

- Research Questions (or Problems)
- Two ideas guided by theories
- Questions on their relationships
- Conceptualization

- Educated guess (via theories)
- Difference
- Association
- Variables (vs. ideas, concepts, and constructs)
- Control variable
- Mediating (Intervening) variable

3주차 온라인 강의 동영상

이후 Howell, Ch. 4 내용 중 Variance와 (분산) Standard deviation은 (표준편차는) 이후 통계 검증방법을 이해하는데 기초가 되는 중요한 내용이니 꼭 숙지하시기 바랍니다.

Navigating software

- Introduction
- Getting and Setting the Working Directory
- Saving Your Workspace
- Viewing Your Command History
- Saving the Result of the Previous Command
- Displaying the Search Path
- Accessing the Functions in a Package
- Accessing Built-in Datasets
- Viewing the List of Installed Packages
- Installing Packages from CRAN
- Setting a Default CRAN Mirror
- Suppressing the Startup Message
- Running a Script
- Running a Batch Script
- Getting and Setting Environment Variables
- Locating the R Home Directory
- Customizing R

Mean

Mode

Median

Variance

Standard Deviation

+-1 sd = 68% = +-1 sd

+-2 sd = 95% = +-1.96 sd

+-3 sd = 99% (99.7%) = +-3 sd

표준점수 (unit with a standard deviation) = z score

Sampling distribution via random sampling

Central Limit Theorem

Find two research articles that have listed hypotheses (social science research article would be good option). For each article:

- 각 가설을 적고
- 독립변인과 종속변인 그리고 intervening (moderator) 변인 등이 무엇인지 설명하시오.
- 각 변인이 어떻게 측정되었는지 설명하시오.
- 각 가설이 어떤 종류인지 설명하시오. (차이, 연관의 가설)
- 가설검증을 위해서 어떤 테스트방법을 취했는지 찾아서 기록하시오.

due date: 다음 주 수요일 자정까지 완성하시오 (2018/09/26 11:59).

Lecture materials for this week

- https://youtu.be/JvpOJPCBQkQ : R cookbook: data structure
- https://youtu.be/_ynGzFFmm7U : Howell Ch 4. Variance 01: Introduction (DS, error, and SS)
- https://youtu.be/HugtyhU7Im8 : Howell Ch. 4. Variance 02: Variance for sample and n-1

- Introduction
- Entering Data from the Keyboard
- Printing Fewer Digits (or More Digits)
- Redirecting Output to a File
- Listing Files
- Dealing with “Cannot Open File” in Windows
- Reading Fixed-Width Records
- Reading Tabular Data Files
- Reading from CSV Files
- Writing to CSV Files
- Reading Tabular or CSV Data from the Web
- Reading Data from HTML Tables
- Reading Files with a Complex Structure
- Reading from MySQL Databases
- Saving and Transporting Objects

- https://youtu.be/RE6DSk1DcJI : 왜 분산에는 n-1을 사용하는가?
- https://youtu.be/PrPoOCW3v1s : n-1 증명
- https://youtu.be/Ssznnbdj5Lg : degrees of freedom
- https://youtu.be/valhVpf-haY : standard deviation
- https://youtu.be/Qaxj6LZ-iL0 : sampling distribution
- https://youtu.be/AbeIQvJJ5Vw : sampling distribution e.g. in R

- Introduction
- Appending Data to a Vector
- Inserting Data into a Vector
- Understanding the Recycling Rule
- Creating a Factor (Categorical Variable)
- Combining Multiple Vectors into One Vector and a Factor
- Creating a List
- Selecting List Elements by Position
- Selecting List Elements by Name
- Building a Name/Value Association List
- Removing an Element from a List
- Flatten a List into a Vector
- Removing NULL Elements from a List
- Removing List Elements Using a Condition
- Initializing a Matrix
- Performing Matrix Operations
- Giving Descriptive Names to the Rows and Columns of a Matrix
- Selecting One Row or Column from a Matrix
- Initializing a Data Frame from Column Data
- Initializing a Data Frame from Row Data
- Appending Rows to a Data Frame
- Preallocating a Data Frame
- Selecting Data Frame Columns by Position
- Selecting Data Frame Columns by Name
- Selecting Rows and Columns More Easily
- Changing the Names of Data Frame Columns
- Editing a Data Frame
- Removing NAs from a Data Frame
- Excluding Columns by Name
- Combining Two Data Frames
- Merging Data Frames by Common Column
- Accessing Data Frame Contents More Easily
- Converting One Atomic Value into Another
- Converting One Structured Data Type into Another

조원들과 협력하여

- 선행연구조사와 가설이 수록된 사회과학 논문을 찾습니다
- dbpia, kyobo scholar를 이용하세요

- 선행연구조사에 수록된 내용을 요약합니다.
- 가설을 소개합니다.
- 각 가설의 독립변인과 종속변인 혹은 그 외의 변인종류를 밝힙니다
- 각 변인이 어떻게 측정되었는지 그 측정수준을 밝힙니다

- 논문을 하나 찾기 전에 조원들과 함께 조원들의 학문적 관심사에 대한 통일을 하여 재미있는 논문을 찾기를 권합니다. 가령 내가 디자인에 관심이 많은 학생이라면 UI와 관련된 논문에 더 관심이 갈 것입니다. 거기에 더하여 요사이 자율주행 자동차 (혹은 그냥 자동차) UI에 대한 논문이 사회과학에서 있어서 읽을 수 있다면 흥미로울 것입니다 (그런데 없을 것 같은 생각이 . . . )
- 마감일은 다음 주 화요일 자정까지 입니다.
- 조원미팅은 카톡방이나 그 외의 테크놀로지를 이용하여 하시는 걸 권합니다.

오늘 할 일 (실시간 온라인 미팅)

- 그룹확인
- 다음 주 퀴즈 공지
- 그룹과제 설명
- 그룹미팅

- Introduction
- Splitting a Vector into Groups
- Applying a Function to Each List Element
- Applying a Function to Every Row
- Applying a Function to Every Column
- Applying a Function to Groups of Data
- Applying a Function to Groups of Rows
- Applying a Function to Parallel Vectors or Lists

Strings and Dates

- First quiz on Week 07, Tuesday class (Oct. 16)
- RANGE: Week 01 - 03 materials + lecture content + textbook
- Textbook:
- chapter 2, 3, 4, 5

- NEXT quiz will be held on Oct. 23 during the mid term schedule.
- The 2nd quiz will cover 1st quiz + Week 05-07 materials.

과제 리뷰 –> groups

- r 에서 qnorm(proportion) pnorm(z-score) function 이해 필요
- z_score 참조

- r 에서, qt(proportion, df), pt(t-score, df) function 이해 필요
- probability 참조

Probability calculation in R ← Probability in R cookbook (텍스트북)

. . . .

ANOVA

factorial anova

correlation

regression

- Introduction
- Counting the Number of Combinations
- Generating Combinations
- Generating Random Numbers
- Generating Reproducible Random Numbers
- Generating a Random Sample
- Generating Random Sequences
- Randomly Permuting a Vector
- Calculating Probabilities for Discrete Distributions
- Calculating Probabilities for Continuous Distributions
- Converting Probabilities to Quantiles
- Plotting a Density Function

- 가설 만들어 보기
- how to write hypothesis at behavioral science writing.
- One sample hypothesis Hypothesis at www.socialresearchmethods.net

시험기간

보강영상 수업

General Statistics

t-test

ANOVA

Factorial ANOVA

repeated measure anova

correlation and regression and multiple regression

- Before regression, SS actually is sum of (
**error**squared of guessing estimates). - sum of error square = 오차의 제곱의 합 = SS (오차라는 단어 없이 사용되는 용어)
- For this, read carefully 표준오차 잔여변량 (standard error residual) in Regression document.

- Introduction
- Summarizing Your Data
- Calculating Relative Frequencies
- Tabulating Factors and Creating Contingency Tables
- Testing Categorical Variables for Independence
- Calculating Quantiles (and Quartiles) of a Dataset
- Inverting a Quantile
- Converting Data to Z-Scores
- Testing the Mean of a Sample (t Test)
- Forming a Confidence Interval for a Mean
- Forming a Confidence Interval for a Median
- Testing a Sample Proportion
- Forming a Confidence Interval for a Proportion
- Testing for Normality
- Testing for Runs
- Comparing the Means of Two Samples
- Comparing the Locations of Two Samples Nonparametrically
- Testing a Correlation for Significance
- Testing Groups for Equal Proportions
- Performing Pairwise Comparisons Between Group Means
- Testing Two Samples for the Same Distribution

getting started

basics

navigating in r

input output in r

data structures

data transformations

- Introduction
- Creating a Scatter Plot
- Adding a Title and Labels
- Adding a Grid
- Creating a Scatter Plot of Multiple Groups
- Adding a Legend
- Plotting the Regression Line of a Scatter Plot
- Plotting All Variables Against All Other Variables
- Creating One Scatter Plot for Each Factor Level
- Creating a Bar Chart
- Adding Confidence Intervals to a Bar Chart
- Coloring a Bar Chart
- Plotting a Line from x and y Points
- Changing the Type, Width, or Color of a Line
- Plotting Multiple Datasets
- Adding Vertical or Horizontal Lines
- Creating a Box Plot
- Creating One Box Plot for Each Factor Level
- Creating a Histogram
- Adding a Density Estimate to a Histogram
- Creating a Discrete Histogram
- Creating a Normal Quantile-Quantile (Q-Q) Plot
- Creating Other Quantile-Quantile Plots
- Plotting a Variable in Multiple Colors
- Graphing a Function
- Pausing Between Plots
- Displaying Several Figures on One Page
- Opening Additional Graphics Windows
- Writing Your Plot to a File
- Changing Graphical Parameters

Quiz 03: Nov. 23

chi-square test

probability

general statistics

Graphics

Do the following

S1 <- c(89, 85, 85, 86, 88, 89, 86, 82, 96, 85, 93, 91, 98, 87, 94, 77, 87, 98, 85, 89, 95, 85, 93, 93, 97, 71, 97, 93, 75, 68, 98, 95, 79, 94, 98, 95) S2 <- c(60, 98, 94, 95, 99, 97, 100, 73, 93, 91, 98, 86, 66, 83, 77, 97, 91, 93, 71, 91, 95, 100, 72, 96, 91, 76, 100, 97, 99, 95, 97, 77, 94, 99, 88, 100, 94, 93, 86) S3 <- c(95, 86, 90, 90, 75, 83, 96, 85, 83, 84, 81, 98, 77, 94, 84, 89, 93, 99, 91, 77, 95, 90, 91, 87, 85, 76, 99, 99, 97, 97, 97, 77, 93, 96, 90, 87, 97, 88) S4 <- c(67, 93, 63, 83, 87, 97, 96, 92, 93, 96, 87, 90, 94, 90, 82, 91, 85, 93, 83, 90, 87, 99, 94, 88, 90, 72, 81, 93, 93, 94, 97, 89, 96, 95, 82, 97) scores <- list(S1=S1,S2=S2,S3=S3,S4=S4)

- find means for each element in “scores” in a list format
- find standard deviation for each element in “scores” in a data frame format
- find variance for each element in “scores” in a data frame format without using “var” function

longdata<- c(-1.850152, -1.406571, -1.0104817, -3.7170704, -0.2804896, 0.9496313, 1.346517, -0.1580926, 1.6272786, -2.4483321, -0.5407272, -1.708678, -0.3480616, -0.2757667, -1.2177024)

- make “longdata” to a matrix whose size is 3 by 5
- name columns “trial1, trial2, . . . . trial5”
- name rows “subject1, subject2, subject3”
- get means for each subject
- attach the above data to the matrix data and name it “longtemp.”
- get standard deviation for each trial
- attach the above data to the matrix data, “longtemp.”

suburbs <- read.csv("http://commres.net/wiki/_export/code/r/data_transformations?codeblock=15", head=T, sep=" ")

- get subrubs data as the above
- get population means by each state (listed in the data, suburbs)
- use aggregate and refer to the below e.g.

attach(Cars93) aggregate(MPG.city ~ Origin, Cars93, mean)

- get population sum by each county with tapply function.
- tapply(number, byfactor, function)
- how many counties are there?
- Use Cars93 data, get MPG.city mean by Origin.

*Using pnorm, qnorm*

pnorm : get proportion out of normal distribution whose characteristics are mean and sd

pnorm(84, mean=72, sd=15.2, lower.tail=FALSE)

- What is the value of the below?

pnorm(1)

- How would you get 68, 95, 99% from pnorm
- use ?pnorm and see the default option

- generate 10 random numbers with runif function

year <- c(1900:2016) # years in vector year world.series <- data.frame(year)

- get 10 year samples out of world.series data with “sample” command
- how would you get the sample sample again latter?

pnorm(110, mean=100, sd=10)

- What would be the result from the above?

library(MASS) # load the MASS package tbl = table(survey$Smoke, survey$Exer) tbl # the contingency table

summary(tbl)

- read the above output and interpret
- what about the below one?

chisq.test(tbl)

see first chi-square test

see chi-square test in r document space for more

library(MASS) cardata <- data.frame(Cars93$Origin, Cars93$Type) cardata

- Can you say the types of cars are different by the Origins?

dur <- faithful$eruptions dur

- make the above data into z-score (zdur).
- get mean of the zdur
- get sd of the zdur

set.seed(1123) x <- rnorm(50, mean=100, sd=15)

- test x against population mean 95.
- test x against population mean 99.
- are they different from each other?
- what would you do if you want to see the different result from the second one?

a = c(65, 78, 88, 55, 48, 95, 66, 57, 79, 81) > t.test(a, mu=60) One Sample t-test data: a t = 2.3079, df = 9, p-value = 0.0464 alternative hypothesis: true mean is not equal to 60 95 percent confidence interval: 60.22187 82.17813 sample estimates: mean of x 71.2

- find the t critical value with function qt.
- explain what happens in the next code
- read (or remind) what pnorm and qnorm do.

> s <- sd(x) > m <- mean(x) > n <- length(x) > n [1] 50 > m [1] 96.00386 > s [1] 17.38321 > SE <- s / sqrt(n) > SE [1] 2.458358 > E <- qt(.975, df=n-1)*SE > E [1] 4.940254 > m + c(-E, E) [1] 91.0636 100.9441 >

- what's wrong with the below?

t.test(x)

> mtcars

- using aggregate, get mean for each trnas. type.
- compare the difference of mileage between auto and manual cars.
- use t.test (two sample)
- “use var.equal=T” option

a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179) b = c(185, 169, 173, 173, 188, 186, 175, 174, 179, 180)

- stack them into data c
- convert colnames into score and trans
- t.test score by trans with var.equal option true.
- aov test
- see t.test t value, t = -0.9474 and F value, F = ?

- Do Ex 1 part in linear regression

ANOVA

oneway anova

twoway anova

linear regression

multiple regression

partial and semipartial correlation

statistical regression methods

sequential_regression

Linear Regression and ANOVA

http://commres.net/wiki/text_mining_example_with_korean_songs

Final quiz

Part I (필기시험): NO open book.

- factor analysis - 이론적인 이해와 관련된 부분
- r 과 관련된 내용 중 통계에 대한 이해와 관련된 부분, 예를 들면
- t-test, ANOVA, Factorial ANOVA output에 대한 이해
- regression, multiple regression output에 대한 이해 등

Part II (r 실기시험): 교재와 R help만 허용

**Final-term**

c/ms/2020/schedule.txt · Last modified: 2020/05/01 21:13 by hkimscil