Class page
multivariate statistics in R
network analysis in R

A User’s Guide to Network Analysis in R (Use R!)
Statistical Analysis of Network Data with R (Use R!) 2014th Edition

https://lagunita.stanford.edu
Network Analysis in R using igraph package – from Datacamp
Marketing analysis in r statistics from Datacamp

Week01 (Sep 4, 6)

ideas and concepts

Introduction to R and others

Downloading and Installing R
1. the_r_project_for_statistical_computing
2. r, getting started
Starting R
Entering Commands
Exiting from R
Interrupting R
Viewing the Supplied Documentation
Getting Help on a Function
Searching the Supplied Documentation
Getting Help on a Package
Searching the Web for Help
Finding Relevant Functions and Packages
Searching the Mailing Lists
Submitting Questions to the Mailing Lists

using ~~theories~~ 연구문제와 가설 and making hypotheses

Installing R

Assignment

Week02 (Sep 11, 13)

Concepts and ideas

Some basics

Introduction
Printing Something
Setting Variables
Listing Variables
Deleting Variables
Creating a Vector
Computing Basic Statistics
Creating Sequences
Comparing Vectors
Selecting Vector Elements
Performing Vector Arithmetic
Getting Operator Precedence Right
Defining a Function
Typing Less and Accomplishing More
Avoiding Some Common Mistakes

Research Question
Hypothesis
- Educated guess (via theories)
- Difference
- Association
- Variables (vs. ideas, concepts, and constructs)
  - Operationalization
  - Variables, Types of Variables
    - see this blog written in Korean
    - IV 독립변인
    - DV 종속변인
    - Control variable 제어변인
    - Mediating (Intervening) variable 매개변인

Assignment

Week03 (Sep 18, 20)

Activities

Grouping. See Group page
Group discussion on group works

Concepts and ideas

You should be knoweldgeable about research question and hypothesis building. However, we will be deal with the issue in the class. Please read the two and 커뮤니케이션_연구문제_제기와_가설 individually. The materials will be on quizzes.

Navigating software

Introduction
Getting and Setting the Working Directory
Saving Your Workspace
Viewing Your Command History
Saving the Result of the Previous Command
Displaying the Search Path
Accessing the Functions in a Package
Accessing Built-in Datasets
Viewing the List of Installed Packages
Installing Packages from CRAN
Setting a Default CRAN Mirror
Suppressing the Startup Message
Running a Script
Running a Batch Script
Getting and Setting Environment Variables
Locating the R Home Directory
Customizing R

Input and output

Introduction
Entering Data from the Keyboard
Printing Fewer Digits (or More Digits)
Redirecting Output to a File
Listing Files
Dealing with “Cannot Open File” in Windows
Reading Fixed-Width Records
Reading Tabular Data Files
Reading from CSV Files
Writing to CSV Files
Reading Tabular or CSV Data from the Web
Reading Data from HTML Tables
Reading Files with a Complex Structure
Reading from MySQL Databases
Saving and Transporting Objects

Assignment

Assignment for all

Group assignment

Hypothesis 문서의 예_1의 “제3자 효과이론과 침묵의 나선이론 연계성” 논문을 읽고 가설을 기술하시오.
각 가설의 독립변인(Independent variables), 종속변인 (dependent variabless) 등을 나열하시오.
이 논문에 사용된 이론은 무엇인지 기술하고 설명하시오.

Week04 (Sep 25, 27)

Class Activity

가설 만들어 보기
- No need to read theories
  - the third person effect
  - Spiral of Silence
  - cognitive dissonance
- Read hypothesis
- how to write hypothesis at behavioral science writing.
- One sample hypothesis Hypothesis at www.socialresearchmethods.net

Concepts and ideas

Data Structures

Introduction
Appending Data to a Vector
Inserting Data into a Vector
Understanding the Recycling Rule
Creating a Factor (Categorical Variable)
Combining Multiple Vectors into One Vector and a Factor
Creating a List
Selecting List Elements by Position
Selecting List Elements by Name
Building a Name/Value Association List
Removing an Element from a List
Flatten a List into a Vector
Removing NULL Elements from a List
Removing List Elements Using a Condition
Initializing a Matrix
Performing Matrix Operations
Giving Descriptive Names to the Rows and Columns of a Matrix
Selecting One Row or Column from a Matrix
Initializing a Data Frame from Column Data
Initializing a Data Frame from Row Data
Appending Rows to a Data Frame
Preallocating a Data Frame
Selecting Data Frame Columns by Position
Selecting Data Frame Columns by Name
Selecting Rows and Columns More Easily
Changing the Names of Data Frame Columns
Editing a Data Frame
Removing NAs from a Data Frame
Excluding Columns by Name
Combining Two Data Frames
Merging Data Frames by Common Column
Accessing Data Frame Contents More Easily
Converting One Atomic Value into Another
Converting One Structured Data Type into Another

data transformations

Assignment

ga04.making.hypothesis 가설 연습 ajoubb

첫번째, R(rstudio사용)에서 default로 구할 수 있는 mtcars 데이터를 이용하여 t-test와 anova test를 할 수 있는 가설을 만들고, R에서 분석해 보세요.
- 가설에 대해서는 hypothesis testing 문서를 참조하시기 바랍니다.
- t-test는 t-test를 참조하시기 바랍니다.
  - 4가지 종류의 t-test 중에서 mtcars 데이터의 경우는 몇 번째 것을 사용해야 하는가에 대해서 확인하세요.
- anova에 대해서는 anova 문서를 참조하세요.
- R에서의 분석은 각각 t.test와 aov 펑션을 이용해야 합니다.
두번째, 신문에서의 여론조사 결과에 나오는 error of margin에 대해서 확인해보시기 바랍니다.
- 여론조사 결과가 내용인 신문기사 2개를 고릅니다.
  - 예
  - 일반적인 se값은 아래와 같이 구합니다.
  - $ \displaystyle \sigma_{\hat{p}} = \sqrt{\frac{p*q}{n}} , \;\;\; q = (1 - p) $
  - $ p = .752 $ = 75.2%

파일을 upload한다면 파일이름은
- ga04.making.hypothesis.그룹이름.ext 과 같이 저장한 후에 올리시기 바랍니다.
- 위에서 “그룹이름”과 “ext”은 그룹에 따라서 바꾸야 합니다.
  - 3조의 경우는 “그룹이름”대신 03을 사용합니다.
  - ms word파일로 저장을 했다면 파일extension으로 “docx”가 생길겁니다. text파일로 저장을 했다면 “txt”가 생길 것입니다.
- 따라서 위의 예에 따르면 과제 이름은
  - ga04.making.hypothesis.03.txt와 같을 겁니다.

Week05 (Oct 2, 4)

ideas and concepts

probability
General Statistics

t.test: mtcars

> mdata <- split(mtcars$mpg, mtcars$am)
> mdata
$`0`
 [1] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[13] 10.4 14.7 21.5 15.5 15.2 13.3 19.2

$`1`
 [1] 21.0 21.0 22.8 32.4 30.4 33.9 27.3 26.0 30.4 15.8 19.7 15.0
[13] 21.4

> stack(mdata)
   values ind
1    21.4   0
2    18.7   0
3    18.1   0
4    14.3   0
5    24.4   0
6    22.8   0
7    19.2   0
8    17.8   0
9    16.4   0
10   17.3   0
11   15.2   0
12   10.4   0
13   10.4   0
14   14.7   0
15   21.5   0
16   15.5   0
17   15.2   0
18   13.3   0
19   19.2   0
20   21.0   1
21   21.0   1
22   22.8   1
23   32.4   1
24   30.4   1
25   33.9   1
26   27.3   1
27   26.0   1
28   30.4   1
29   15.8   1
30   19.7   1
31   15.0   1
32   21.4   1
> mdata
$`0`
 [1] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[13] 10.4 14.7 21.5 15.5 15.2 13.3 19.2

$`1`
 [1] 21.0 21.0 22.8 32.4 30.4 33.9 27.3 26.0 30.4 15.8 19.7 15.0
[13] 21.4

> t.test(mpg~am, data=mtcars)

	Welch Two Sample t-test

data:  mpg by am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.280194  -3.209684
sample estimates:
mean in group 0 mean in group 1 
       17.14737        24.39231 

> t.test(mpg~am, data=mtcars, var.equal=T)

	Two Sample t-test

data:  mpg by am
t = -4.1061, df = 30, p-value = 0.000285
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.84837  -3.64151
sample estimates:
mean in group 0 mean in group 1 
       17.14737        24.39231 

> m1 <- mdata[[1]]
> m2 <- mdata[[2]]
> m1
 [1] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[13] 10.4 14.7 21.5 15.5 15.2 13.3 19.2
> m2
 [1] 21.0 21.0 22.8 32.4 30.4 33.9 27.3 26.0 30.4 15.8 19.7 15.0
[13] 21.4
> m1.var <- var(m1)
> m2.var <- var(m2)
> m1.n <- length(m1)
> m2.n <- length(m2)
> m1.df <- length(m1)-1
> m2.df <- length(m2)-1
> m1.ss <- m1.var*m1.df
> m2.ss <- m2.var*m2.df
> m1.ss
[1] 264.5874
> m2.ss
[1] 456.3092
> m12.ss <- m1.ss+m2.ss
> m12.ss
[1] 720.8966
> m12.df <- m1.df+m2.df
> pv <- m12.ss/m12.df
> pv
[1] 24.02989
> pv/m1.n
[1] 1.264731
> pv/m2.n
[1] 1.848453
> m.se <- sqrt((pv/m1.n)+(pv/m2.n))
> m.se
[1] 1.764422
> m1.m <- mean(m1)
> m2.m <- mean(m2)
> m.tvalue <- (m1.m-m2.m)/m.se
> m.tvalue
[1] -4.106127

> t.test(mpg~am, data=mtcars, var.equal=T)

	Two Sample t-test

data:  mpg by am
t = -4.1061, df = 30, p-value = 0.000285
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.84837  -3.64151
sample estimates:
mean in group 0 mean in group 1 
       17.14737        24.39231

anova: mtcars

stats4each = function(x,y) {
   meani <- tapply(x,y,mean)
   vari <- tapply(x,y,var)
   ni <- tapply(x,y,length)
   dfi <- tapply(x,y,length)-1
   ssi <- tapply(x,y,var)*(tapply(x,y,length)-1)
   out <- rbind(meani,vari,ni,dfi,ssi)
   
   return(out)  
}

library(MASS)

tempd <- iris
x <- tempd$Species
y <- tempd$Sepal.Width

tempd <- mtcars
x <- tempd$gear
y <- tempd$mpg

tempd <- mtcars
x <- tempd$am
y <- tempd$mpg


x <- factor(x)
dfbetween <- nlevels(x)-1

stats <- stats4each(y, x)
stats 

sswithin <- sum(stats[5,])
sstotal <- var(y)*(length(y)-1)
ssbetween <- sstotal-sswithin

round(sswithin,2)
round(ssbetween,2)
round(sstotal,2)

dfwithin <- sum(stats[4,])
dftotal <- length(y)-1

dfwithin
dfbetween
dftotal

mswithin <- sswithin / dfwithin
msbetween <- ssbetween / dfbetween
mstotal <- sstotal / dftotal

round(mswithin,2)
round(msbetween,2)
round(mstotal,2)

fval <- round(msbetween/mswithin,2)
fval
siglevel <- pf(q=fval, df1=dfbetween, df2=dfwithin, lower.tail=FALSE)
siglevel

mod <- aov(y~x, data=tempd)
summary(mod)

cor

attach(mtcars)
cor(mpg, hp)

mycor <- cov(mpg,hp)/(sd(mpg)*sd(hp))
mycor

sp <- cov(mpg,hp)*(length(mtcars$hp)-1)
ssx <- var(mpg)*(length(mtcars$mpg)-1)
ssy <- var(hp)*(length(mtcars$hp)-1)

mycor2 <- sp/sqrt(ssx*ssy)
mycor2

mycor2 == mycor
mycor == cor(mpg,hp)
mycor2 == cor(mpg,hp)

Assignment

Week06 (Oct 9, 11)

ideas and concepts

correlation
regression
multiple regression

Partial and semipartial correlation
using dummy variables

Statistical Regression Methods
Sequential Regression

Assignment

Public opinion in online environments ¹⁾
- Spiral of Silence
- Pluralistic Ignorance
- The Third Person Effect
- etc. 여론형성과 관련된 사회학적 혹은 사회심리학적 이론을 찾아보고 소개하기, 예로 위의 세가지. 얼마전 사회현상을 어떻게 설명하면 좋을까에 대해서 논의정리하기? 정확한 온라인 환경에서의 여론파악을 위해서 어떤 것이 필요할까?
- 혹은 다른 문제에 대해서 (. . . 조에 따른 . . .)
Hypotheses
- Multiple regression hypotheses.
- Google Survey Questions

Week07 (Oct 16, 18)

ideas and concepts

Assignment

Week08 (Oct 23, 25)

Mid-term period

Quiz the first one

Lecture materials + textbook
Textbook: r cookbook: textbook과 관련해서는 예상되는 아웃풋, 아웃풋을 얻기위한 명령어, 명령어(function)에 사용되는 옵션이 의미하는 것 등에 대한 사지선다 혹은 단답식 질문이 나옵니다. 펑션의 옵션사용 등과 같은 정확한 것에 대해서는 질문이 나오지 않습니다.
- 예
  - one sample t-test를 하기 위한 명령어를 쓰시오 (x)
  - t.test(sample, mu=100)에서 mu는 무엇을 의미하는가? (o)
  - 다음 중 sapply의 아웃풋 모양으로 적당한 것은? 등등
- The r project for statistical computing
- Getting started
- Basics
- Navigating
- Input output
- Data structures
- Data transformations
Lecture content
- Hypothesis,
- Research question,
- 커뮤니케이션 연구문제 제기와 가설 부분만
- Operationalization,
- Variables,
- Types of variables
- Hypothesis testing
- T-test
  - 정확한 t test 공식등은 외울 필요가 없습니다. (제공됩니다).
  - 간단한 t test 계산을 요구할 수 있습니다.
  - ANOVA도 마찬가지입니다.
- ANOVA

Week09 (Oct 30, Nov 1)

ideas and concepts

correlation
regression
multiple regression

Partial and semipartial correlation
using dummy variables

Statistical Regression Methods
Sequential Regression

Activity

Multiple Regression Exercise

Assignment

Week10 (Nov 6, 8)

ideas and concepts

factor analysis

Assignment

Week11 (Nov 13, 15)

ideas and concepts

Assignment

Week12 (Nov 20, 22)

ideas and concepts

Assignment

factor analysis assignment

Week13 (Nov 27, 29)

ideas and concepts

social network analysis
social network analysis tutorial
sna in r
Stanford University egs.

announcement

Quiz 2 (on Friday Dec. the 6th) covers:

Some R outputs will be used to ask the related concepts and ideas (the above).

Assignment

Week14 (Dec 4, 6)

Group Presentation

Week15 (Dec 11, 13)

assignment week15

Week16 (June 18, 20)

Final-term covers:
correlation
regression
multiple regression
partial and semipartial correlation
using dummy variables
factor analysis
social network analysis
sna tutorial
sna in r
SNA e.g. lab 06

Some R outputs will be used to ask the related concepts and ideas (the above).

¹⁾

refer to public.opinion.theories.introduction.pdf

Table of Contents

Week01 (Sep 4, 6)

ideas and concepts

Assignment

Week02 (Sep 11, 13)

Concepts and ideas

Assignment

Week03 (Sep 18, 20)

Activities

Concepts and ideas

Assignment

Week04 (Sep 25, 27)

Class Activity

Concepts and ideas

Assignment

Week05 (Oct 2, 4)

ideas and concepts

t.test: mtcars

anova: mtcars

cor

Assignment

Week06 (Oct 9, 11)

ideas and concepts

Assignment

Week07 (Oct 16, 18)

ideas and concepts

Assignment

Week08 (Oct 23, 25)

Quiz the first one

Week09 (Oct 30, Nov 1)

ideas and concepts

Activity

Assignment

Week10 (Nov 6, 8)

ideas and concepts

Assignment

Week11 (Nov 13, 15)

ideas and concepts

Assignment

Week12 (Nov 20, 22)

ideas and concepts

Assignment

Week13 (Nov 27, 29)

ideas and concepts

announcement

Assignment

Week14 (Dec 4, 6)

Week15 (Dec 11, 13)

Week16 (June 18, 20)